Max Sobol Mark (@maxsobolmark) Twitter Tweets • TwiCopy

Jingyun Yang

2 years ago

Announcing RoboFuME🤖💨, a system for autonomous & efficient real-world robot learning that 1. pre-trains a VLM reward model and a multi-task RL policy from diverse off-the-shelf demo data; 2. runs RL fine-tuning online with the VLM reward model. 🔗 robofume.github.io 🧵↓

thumb_up_off_alt217

chat_bubble_outline1

repeat43

shareShare

Aviral Kumar

@aviral_kumar2

a year ago

How can we fine-tune generalist policies autonomously w/ RL (value functions)? Max Sobol Mark's new paper on Policy-agnostic RL provides a single way to fine-tune generalist VLAs w/ any backbone, output, size (we fine-tune 7B OpenVLA on real robot) policyagnosticrl.github.io🧵⬇️

thumb_up_off_alt108

chat_bubble_outline1

repeat26

shareShare

Yuxiao Qu

@quyuxiao

9 months ago

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

thumb_up_off_alt309

chat_bubble_outline6

repeat62

shareShare

Max Sobol Mark

@maxsobolmark

8 months ago

I'll be presenting Policy-Agnostic RL: Fine-Tuning of Any Policy Class and Backbone at the Robot Learning (Sunday) and GenBot (Monday) workshops as Orals at #ICLR2025! Happy to chat or meet!

thumb_up_off_alt31

chat_bubble_outline0

repeat4

shareShare

Fahim Tajwar

@fahimtajwar10

7 months ago

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

thumb_up_off_alt819

chat_bubble_outline20

repeat136

shareShare

Dhruv Shah

@shahdhruv_

6 months ago

Excited to release Gemini Robotics On-Device and bunch of goodies today 🍬 on-device VLA that you can run on a GPU 🍬 open-source MuJoCo sim (& benchmark) for bimanual dexterity 🍬 broadening access to these models to academics and developers deepmind.google/discover/blog/…

thumb_up_off_alt405

chat_bubble_outline10

repeat59

shareShare