Jesse Zhang (@jesse_y_zhang) Twitter Tweets • TwiCopy

Jesse Zhang

@jesse_y_zhang

+ Follow

PhD Student at USC focused on Robotics/Deep RL. Advisors: Erdem Biyik, Jesse Thomason, Joseph J. Lim. Focused on scalable, sample-efficient robot adaptation.

ID: 1341621435136032770

linkhttps://jessezhang.net calendar_today23-12-2020 05:47:43

91 Tweet

729 Takipçi

374 Takip Edilen

Ilir Aliu - eu/acc

@iliraliu_

2 months ago

Most robot policies fail because… they don’t know where to look or what to focus on. PEEK fixes that, using vision-language models (VLMs) to guide any visuomotor policy. ✅ Adds VLM-generated overlays showing “where” and “what” directly on training images ✅ Works with ACT,

thumb_up_off_alt61

chat_bubble_outline3

repeat10

shareShare

Andrej Karpathy

@karpathy

2 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

thumb_up_off_alt16,16K

chat_bubble_outline517

repeat2,2K

shareShare

Abhishek Gupta

@abhishekunique7

a month ago

Combinatorial complexity is often the bane of imitation learning - including VLA models! Jesse Zhang and Marius Memmel proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and

thumb_up_off_alt147

chat_bubble_outline3

repeat24

shareShare

Jiahui Zhang

@jiahuizhang__32

a month ago

We’re excited to release the code for our CoRL 2025 (Oral) paper: “ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations.” 🌐 Website: rewind-reward.github.io 📄 Arxiv: arxiv.org/abs/2505.10911 💻 Code: github.com/rewind-reward/… x.com/Jesse_Y_Zhang/…

thumb_up_off_alt150

chat_bubble_outline4

repeat36

shareShare

Kun Lei

@kunlei15

a month ago

Introducing RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning. lei-kun.github.io/RL-100/ 7 real robot tasks, 900/900 successes. Up to 250 consecutive trials in one task, running 2 hours nonstop without failure. High success rate against physical

thumb_up_off_alt355

chat_bubble_outline14

repeat66

shareShare

Yiğit Korkmaz

@yigitkkorkmaz

a month ago

Can Q-learning alone handle continuous actions? Value-based RL (like DQN) is simple & stable, but typically limited to discrete actions. Continuous control usually needs actor-critic methods (DDPG, TD3, SAC) that are powerful but unstable & can get stuck in local optima.

thumb_up_off_alt105

chat_bubble_outline5

repeat15

shareShare

Erdem Bıyık

@ebiyik_

a month ago

Actor-critic RL but there is no actor 🤯 because the critic can control the system even with a continuous action space! The result: More stable RL and better robustness against local optima (because there is no separate training for an actor) Check out our NeurIPS paper :) 👇

thumb_up_off_alt132

chat_bubble_outline1

repeat16

shareShare

Apurva Badithela

@apurvabadithela

a month ago

Robotic manipulation has seen tremendous progress in recent years but rigorous evaluation of robot policies remains a challenge! We present our work: "Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators"! 🧵

thumb_up_off_alt72

chat_bubble_outline2

repeat16

shareShare

Abhishek Gupta

@abhishekunique7

a month ago

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only

thumb_up_off_alt404

chat_bubble_outline12

repeat69

shareShare

Mateo Guaman Castro

@mateoguaman

a month ago

How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision? Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable

thumb_up_off_alt118

chat_bubble_outline4

repeat39

shareShare

Abhishek Gupta

@abhishekunique7

25 days ago

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to

thumb_up_off_alt233

chat_bubble_outline7

repeat47

shareShare

Jiafei Duan

@djiafei

16 days ago

I’m on the academic market this year, and is activately seeking faculty position in robot learning. My work focuses on developing efficient robotics foundation models with strong priors for reasoning and generalization. Please ping me up if there is any opportunities!

thumb_up_off_alt132

chat_bubble_outline2

repeat11

shareShare

Paul Zhou

@zhiyuan_zhou_

10 days ago

Very excited to finally share what I’ve been up to Physical Intelligence for the past 6 months: developing advantage-conditioned VLAs! We are finally moving beyond imitating teleop data, and towards improving models with suboptimal deployment data using scalable real-world RL. 👇🧵

thumb_up_off_alt277

chat_bubble_outline5

repeat26

shareShare

Jason Ma

@jasonma2020

10 days ago

Some of my favorites on reward modeling, RL, and robust VLA from the community: arxiv.org/abs/2505.10911 from Jesse Zhang arxiv.org/abs/2510.14830 from Kun Lei arxiv.org/abs/2509.25358 from Qianzhong Chen and pi0.6 from Physical Intelligence today: physicalintelligence.company/blog/pistar06

thumb_up_off_alt85

chat_bubble_outline0

repeat13

shareShare