Max Sobol Mark (@maxsobolmark) 's Twitter Profile
Max Sobol Mark

@maxsobolmark

PhD student at @CarnegieMellon

ID: 2202153178

linkhttps://maxsobolmark.com calendar_today30-11-2013 19:19:01

55 Tweet

145 Followers

123 Following

Jingyun Yang (@yjy0625) 's Twitter Profile Photo

Announcing RoboFuME🤖💨, a system for autonomous & efficient real-world robot learning that 1. pre-trains a VLM reward model and a multi-task RL policy from diverse off-the-shelf demo data; 2. runs RL fine-tuning online with the VLM reward model. 🔗 robofume.github.io 🧵↓

Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

How can we fine-tune generalist policies autonomously w/ RL (value functions)? Max Sobol Mark's new paper on Policy-agnostic RL provides a single way to fine-tune generalist VLAs w/ any backbone, output, size (we fine-tune 7B OpenVLA on real robot) policyagnosticrl.github.io🧵⬇️

Yuxiao Qu (@quyuxiao) 's Twitter Profile Photo

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"!

🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it!

Website: cohenqu.github.io/mrt.github.io/

🧵[1/9]
Max Sobol Mark (@maxsobolmark) 's Twitter Profile Photo

I'll be presenting Policy-Agnostic RL: Fine-Tuning of Any Policy Class and Backbone at the Robot Learning (Sunday) and GenBot (Monday) workshops as Orals at #ICLR2025! Happy to chat or meet!

Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?

Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!

🧵 1/n
Dhruv Shah (@shahdhruv_) 's Twitter Profile Photo

Excited to release Gemini Robotics On-Device and bunch of goodies today 🍬 on-device VLA that you can run on a GPU 🍬 open-source MuJoCo sim (& benchmark) for bimanual dexterity 🍬 broadening access to these models to academics and developers deepmind.google/discover/blog/…

TEDxORTArg (@tedxortarg) 's Twitter Profile Photo

Joel Sobol Mark presenta: "Una segunda oportunidad: ideas y libros" con la historia de su emprendimiento en #TEDxORTArg2014

TEDxORTArg (@tedxortarg) 's Twitter Profile Photo

Joel Sobol Mark nos muestra que los obstáculos al emprender pueden transformarse en ventajas! #TEDxORTArg2014 youtu.be/3-GKjo6Flhc