Jesse Zhang (@jesse_y_zhang) 's Twitter Profile
Jesse Zhang

@jesse_y_zhang

PhD Student at USC focused on Robotics/Deep RL. Advisors: Erdem Biyik, Jesse Thomason, Joseph J. Lim. Focused on scalable, sample-efficient robot adaptation.

ID: 1341621435136032770

linkhttps://jessezhang.net calendar_today23-12-2020 05:47:43

91 Tweet

729 Takipçi

374 Takip Edilen

Ilir Aliu - eu/acc (@iliraliu_) 's Twitter Profile Photo

Most robot policies fail because… they don’t know where to look or what to focus on. PEEK fixes that, using vision-language models (VLMs) to guide any visuomotor policy. ✅ Adds VLM-generated overlays showing “where” and “what” directly on training images ✅ Works with ACT,

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Combinatorial complexity is often the bane of imitation learning - including VLA models! Jesse Zhang and Marius Memmel proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and

Jiahui Zhang (@jiahuizhang__32) 's Twitter Profile Photo

We’re excited to release the code for our CoRL 2025 (Oral) paper: “ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations.” 🌐 Website: rewind-reward.github.io 📄 Arxiv: arxiv.org/abs/2505.10911 💻 Code: github.com/rewind-reward/… x.com/Jesse_Y_Zhang/…

Kun Lei (@kunlei15) 's Twitter Profile Photo

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. lei-kun.github.io/RL-100/ 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical

Yiğit Korkmaz (@yigitkkorkmaz) 's Twitter Profile Photo

Can Q-learning alone handle continuous actions? Value-based RL (like DQN) is simple & stable, but typically limited to discrete actions. Continuous control usually needs actor-critic methods (DDPG, TD3, SAC) that are powerful but unstable & can get stuck in local optima.

Erdem Bıyık (@ebiyik_) 's Twitter Profile Photo

Actor-critic RL but there is no actor 🤯 because the critic can control the system even with a continuous action space! The result: More stable RL and better robustness against local optima (because there is no separate training for an actor) Check out our NeurIPS paper :) 👇

Apurva Badithela (@apurvabadithela) 's Twitter Profile Photo

Robotic manipulation has seen tremendous progress in recent years but rigorous evaluation of robot policies remains a challenge! We present our work: "Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators"! 🧵

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only

Mateo Guaman Castro (@mateoguaman) 's Twitter Profile Photo

How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision? Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to

Jiafei Duan (@djiafei) 's Twitter Profile Photo

I’m on the academic market this year, and is activately seeking faculty position in robot learning. My work focuses on developing efficient robotics foundation models with strong priors for reasoning and generalization. Please ping me up if there is any opportunities!

Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

Very excited to finally share what I’ve been up to Physical Intelligence for the past 6 months: developing advantage-conditioned VLAs! We are finally moving beyond imitating teleop data, and towards improving models with suboptimal deployment data using scalable real-world RL. 👇🧵

Jason Ma (@jasonma2020) 's Twitter Profile Photo

Some of my favorites on reward modeling, RL, and robust VLA from the community: arxiv.org/abs/2505.10911 from Jesse Zhang arxiv.org/abs/2510.14830 from Kun Lei arxiv.org/abs/2509.25358 from Qianzhong Chen and pi0.6 from Physical Intelligence today: physicalintelligence.company/blog/pistar06