Oleg Rybkin (@_oleh) Twitter Tweets • TwiCopy

Oleg Rybkin

@_oleh

+ Follow

🇺🇦 Postdoc @ Berkeley. Interested in RL at scale.

ID: 2306706864

linkhttp://olehrybkin.com calendar_today23-01-2014 14:37:12

282 Tweet

835 Takipçi

402 Takip Edilen

fly51fly

@fly51fly

9 months ago

[LG] Value-Based Deep RL Scales Predictably O Rybkin, M Nauman, P Fu, C Snell... [UC Berkeley] (2025) arxiv.org/abs/2502.04327

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

AK

@_akhaliq

9 months ago

Value-Based Deep RL Scales Predictably

thumb_up_off_alt49

chat_bubble_outline2

repeat7

shareShare

Roko 🐉🤖😇

@rokomijic

9 months ago

👀

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Can we make robot policy evaluation easier and less time consuming? Introducing AutoEval, a system that *autonomously* evaluates generalist policies 24/7 and closely matches human results. We make 4 tasks 💫publicly available💫 Submit your policy at auto-eval.github.io! 🧵👇

thumb_up_off_alt159

chat_bubble_outline3

repeat21

shareShare

Danijar Hafner

@danijarh

8 months ago

Excited to share that DreamerV3 has been published in Nature! Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model 🌏 It's the first agent to find diamonds in Minecraft from scratch without human data! 💎 👇

thumb_up_off_alt1,1K

chat_bubble_outline46

repeat148

shareShare

Chuning Zhu

@chuning_zhu

7 months ago

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)

thumb_up_off_alt250

chat_bubble_outline13

repeat39

shareShare

Oleg Rybkin

@_oleh

7 months ago

Check out a new paper by Amber Xie! We show that you can do robotic imitation learning well by planning future latent states instead of actions with a diffusion model. This planning method is also more flexible, allowing you to use suboptimal and action-free data.

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Aviral Kumar

@aviral_kumar2

7 months ago

Oleg Rybkin will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably. Talk at Workshop on robot learning (WRL), April 27. Charlie Snell will then present the poster!

<a href="/_oleh/">Oleg Rybkin</a> will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably.

Talk at Workshop on robot learning (WRL), April 27. <a href="/sea_snell/">Charlie Snell</a> will then present the poster!

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Amber Xie

@amberxie_

7 months ago

Oleg Rybkin Dorsa Sadigh Chelsea Finn To be presented at ICML 2025 as a *spotlight poster* :)

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Arthur Allshire

@arthurallshire

6 months ago

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ Hongsuk Benjamin Choi Junyi Zhang David McAllister)

thumb_up_off_alt550

chat_bubble_outline28

repeat98

shareShare

Paul Zhou

@zhiyuan_zhou_

6 months ago

This was fun thanks for having me Chris Paxton Michael Cho - Rbt/Acc! See the podcast for some livestream of the robot in real time and me evaluating a policy live! Or check it out for yourself at auto-eval.github.io and eval your policy in real without breaking a sweat

thumb_up_off_alt35

chat_bubble_outline2

repeat6

shareShare

Seohong Park

@seohong_park

6 months ago

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

thumb_up_off_alt339

chat_bubble_outline5

repeat41

shareShare