Jinkun Cao (@jinkuncao) 's Twitter Profile
Jinkun Cao

@jinkuncao

PhD student at Carnegie Mellon, working on robotics and 3D vision. Actively looking for full-time research positions.

ID: 2478547765

linkhttp://www.jinkuncao.com calendar_today05-05-2014 15:52:50

108 Tweet

523 Takipçi

302 Takip Edilen

Han Xue (@hanxue012) 's Twitter Profile Photo

Humans can easily perform complex contact-rich tasks with vision and touch, but these tasks remain challenging for robots. How can we resolve this from both the algorithm side and the data side? Introducing Reactive Diffusion Policy (RDP),a slow-fast imitation learning algorithm

Hao-Shu Fang (@haoshu_fang) 's Twitter Profile Photo

Super fun chatting with Chris Paxton and Michael Cho - Rbt/Acc about AnyDexGrasp! 🚀 We talked about how to make robots grasp like humans — fast, efficient, and across different hands. Big thanks to both of them for the great conversation and for digging into the details! Check it out!

Yi Zhou (@papagina_yi) 's Twitter Profile Photo

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose

Tairan He (@tairanhe99) 's Twitter Profile Photo

Excited to be at #ICRA this week! Working on humanoids, RL, or sim-to-real? Let’s grab coffee—DMs are open. See you there! Presentation for: HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots 📍 Room 307 (Regular Session WeET6: Learning for Legged Locomotion 1) ⏰

Jeff Li (@jiefengli_jeff) 's Twitter Profile Photo

📣📣📣 Excited to share GENMO: A Generalist Model for Human Motion. Words can’t perfectly describe human motion—so we build GENMO. It’s everything to motion. 🔥Video, Text, Music, Audio, Keyframes, Spatial Control…🔥 -- GENMO handles it all within a single model. 📹 Two

Xun Huang (@xunhuang1995) 's Twitter Profile Photo

What exactly is a "world model"? And what limits existing video generation models from being true world models? In my new blog post, I argue that a true video world model must be causal, interactive, persistent, real-time, and physical accurate. xunhuang.me/blogs/world_mo…

Ruihan Yang (@rchalyang) 's Twitter Profile Photo

Turns out, when we discuss “humanoid robot” everyone’s picturing something totally different. So I made this figure, and next time, i'll show this, before discussion.

Turns out, when we discuss “humanoid robot” everyone’s picturing something totally different. So I made this figure, and next time, i'll show this, before discussion.
David Park (@park_jinhyung1) 's Twitter Profile Photo

Introducing ATLAS: A high-fidelity, parametric human body model enabling precise, independent control of surface and skeletal attributes for character creation. To be presented at #ICCV2025! Learn more about ATLAS here: jindapark.github.io/projects/atlas/

Introducing ATLAS: A high-fidelity, parametric human body model enabling precise, independent control of surface and skeletal attributes for character creation. To be presented at #ICCV2025!

Learn more about ATLAS here:
jindapark.github.io/projects/atlas/
Yuda Song @ ICLR 2025 (@yus167) 's Twitter Profile Photo

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations.
How should we design algorithms under this partial observability?
Should we decide (end-to-end RL) or distill (from a privileged expert)?
We study this trade-off in locomotion. 🧵(1/n)
Rohan Choudhury (@rchoudhury997) 's Twitter Profile Photo

Excited to release our new preprint - we introduce Adaptive Patch Transformers (APT), a method to speed up vision transformers by using multiple different patch sizes within the same image!

Zhengyi “Zen” Luo (@zhengyiluo) 's Twitter Profile Photo

Humanoids need a single, generalist control policy for all of their physical tasks, not a new one for every new chore or demo. A policy for walking can't dance. A policy for dancing can't support mowing the lawn. We need to scale up humanoid control for diverse behaviors, just

Jinkun Cao (@jinkuncao) 's Twitter Profile Photo

Hao-Shu was the first mentor when I started in the area of AI nearly 10 yrs ago and has been one of the best senior researchers I could always learn from. Go apply!

Jinkun Cao (@jinkuncao) 's Twitter Profile Photo

The results are truly impressive. Additionally, I truly appreciate the research approach that seeks to answer 'what is enough' rather than simply stacking one 'novel' piece on top of another while leaving readers with more questions.

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: go.meta.me/591040 2️⃣ SAM 3D

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve

CMU Center for Perceptual Computing and Learning (@robovisioncmu) 's Twitter Profile Photo

New model from Meta, SAM 3D Body, powered by people from Smith Hall (Kris Kitani,Jinkun Cao, David Park, Jyun-Ting Song) of course! #goSmithHall Introducing SAM 3D: a New Standard for 3D Object & Human Reconstruction ... youtu.be/B7PZuM55ayc?si… via YouTube

Wildminder (@wildmindai) 's Twitter Profile Photo

ComfyUI-SAM3DBody: single-image full-body 3D human mesh recovery; uses the Momentum Human Rig (MHR) for SOTA accuracy on in-the-wild poses. github.com/PozzettiAndrea…

ComfyUI-SAM3DBody: single-image full-body 3D human mesh recovery; uses the Momentum Human Rig (MHR) for SOTA accuracy on in-the-wild poses.
github.com/PozzettiAndrea…
AI at Meta (@aiatmeta) 's Twitter Profile Photo

SAM 3D is helping advance the future of rehabilitation. See how researchers at Carnegie Mellon University are using SAM 3D to capture and analyze human movement in clinical settings, opening the doors to personalized, data-driven insights in the recovery process. 🔗 Learn more about SAM