Oleg Rybkin (@_oleh) 's Twitter Profile
Oleg Rybkin

@_oleh

🇺🇦 Postdoc @ Berkeley. Interested in RL at scale.

ID: 2306706864

linkhttp://olehrybkin.com calendar_today23-01-2014 14:37:12

282 Tweet

835 Takipçi

402 Takip Edilen

fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Value-Based Deep RL Scales Predictably O Rybkin, M Nauman, P Fu, C Snell... [UC Berkeley] (2025) arxiv.org/abs/2502.04327

[LG] Value-Based Deep RL Scales Predictably
O Rybkin, M Nauman, P Fu, C Snell... [UC Berkeley] (2025)
arxiv.org/abs/2502.04327
Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

Can we make robot policy evaluation easier and less time consuming? Introducing AutoEval, a system that *autonomously* evaluates generalist policies 24/7 and closely matches human results. We make 4 tasks 💫publicly available💫 Submit your policy at auto-eval.github.io! 🧵👇

Danijar Hafner (@danijarh) 's Twitter Profile Photo

Excited to share that DreamerV3 has been published in Nature! Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model 🌏 It's the first agent to find diamonds in Minecraft from scratch without human data! 💎 👇

Excited to share that DreamerV3 has been published in Nature!

Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model 🌏

It's the first agent to find diamonds in Minecraft from scratch without human data! 💎

👇
Chuning Zhu (@chuning_zhu) 's Twitter Profile Photo

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)

Oleg Rybkin (@_oleh) 's Twitter Profile Photo

Check out a new paper by Amber Xie! We show that you can do robotic imitation learning well by planning future latent states instead of actions with a diffusion model. This planning method is also more flexible, allowing you to use suboptimal and action-free data.

Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

Oleg Rybkin will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably. Talk at Workshop on robot learning (WRL), April 27. Charlie Snell will then present the poster!

<a href="/_oleh/">Oleg Rybkin</a> will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably.

Talk at Workshop on robot learning (WRL), April 27.  <a href="/sea_snell/">Charlie Snell</a> will then present the poster!
Arthur Allshire (@arthurallshire) 's Twitter Profile Photo

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ Hongsuk Benjamin Choi Junyi Zhang David McAllister)

Paul Zhou (@zhiyuan_zhou_) 's Twitter Profile Photo

This was fun thanks for having me Chris Paxton Michael Cho - Rbt/Acc! See the podcast for some livestream of the robot in real time and me evaluating a policy live! Or check it out for yourself at auto-eval.github.io and eval your policy in real without breaking a sweat

Seohong Park (@seohong_park) 's Twitter Profile Photo

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

We found a way to do RL *only* with BC policies.

The idea is simple:

1. Train a BC policy π(a|s)
2. Train a conditional BC policy π(a|s, z)
3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG

Here, z can be anything (e.g., goals for goal-conditioned RL).

🧵↓