Hongsuk Benjamin Choi (@redstone_hong) 's Twitter Profile
Hongsuk Benjamin Choi

@redstone_hong

ID: 879115184458969089

linkhttps://hongsukchoi.github.io/ calendar_today25-06-2017 23:12:47

24 Tweet

72 Takipçi

96 Takip Edilen

Jason Liu (@jasonjzliu) 's Twitter Profile Photo

Robot data is expensive and hard to scale But what if we could collect rich, diverse demos—with just our hands? 🙌 Our latest work, DexWild, shows how large-scale human data 💪 + robot data 🦾 co-training enables strong generalization across tasks, scenes, and embodiments

Max Fu (@letian_fu) 's Twitter Profile Photo

Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models

Yi Zhou (@papagina_yi) 's Twitter Profile Photo

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose

Junyi Zhang (@junyi42) 's Twitter Profile Photo

Very impressive! At VideoMimic.net, we already: learn from 3rd-person human videos + RL -- for locomotion. Excited to see where this path goes next!

Brett Adcock (@adcock_brett) 's Twitter Profile Photo

Singapore's Sharpa unveiled SharpaWave, a lifelike robotic hand —Features 22 DOF to balance for dexterity and strength —Each fingertip has 1,000+ tactile sensing pixels and 5 mN pressure sensitivity —AI models adapt the hand's grip and modulate force

The Humanoid Hub (@thehumanoidhub) 's Twitter Profile Photo

In their latest video, Boston Dynamics’s AI team explains how they make the Atlas humanoid perceive and interact with the world. Atlas uses an agile perception system to understand both the shape and context of objects in complex environments. Atlas combines 2D and 3D

Younggyo Seo (@younggyoseo) 's Twitter Profile Photo

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Seohong Park (@seohong_park) 's Twitter Profile Photo

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

We found a way to do RL *only* with BC policies.

The idea is simple:

1. Train a BC policy π(a|s)
2. Train a conditional BC policy π(a|s, z)
3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG

Here, z can be anything (e.g., goals for goal-conditioned RL).

🧵↓
Lucky Iyinbor (@luckyballa) 's Twitter Profile Photo

So Flow Matching is *just* xt = mix(x0, x1, t) loss = mse((x1 - x0) - nn(xt, t)) Nice, here it is in a fragment shader :) shadertoy.com/view/tfdXRM

Seohong Park (@seohong_park) 's Twitter Profile Photo

Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓

Neerja Thakkar (@neerjathakkar) 's Twitter Profile Photo

Can we systematically generalize AR "word models" into "world models”? Our CVPR 2025 paper introduces a unified, general framework designed to model real-world, multi-agent interactions by disentangling task-specific modeling from behavior prediction.

Can we systematically generalize AR "word models" into "world models”? Our CVPR 2025 paper introduces a unified, general framework designed to model real-world, multi-agent interactions by disentangling task-specific modeling from behavior prediction.
Chongyi Zheng (@chongyiz1) 's Twitter Profile Photo

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.