
Karl Pertsch
@karlpertsch
Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.
ID: 3377714115
http://kpertsch.github.io 15-07-2015 19:46:33
353 Tweet
3,3K Followers
269 Following

โญ The first foundational model available on LeRobot โญ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by Physical Intelligence and ported to pytorch by Pablo Montalvo ๐๐งต




One step closer to voice mode with embodiment -- very cool! Congrats to Dorsa Sadigh and the Google robotics team! :)



Our VLA policies now generalize to new homes! ๐ ๐ ๐ The main takeaway of ฯ-0.5 is that with good tokenization + flex. VLA architecture you can get away with relatively little mobile manip data (~400h) and still get policies that generalize to cleaning unseen kitchens & bedrooms!



How can we make robot policy evaluation easier, more accessible, and more comparable? Our answer: autonomous 24/7 eval in real AutoEval will be presented by Sergey Levine at the Robot Learning Workshop at #ICLR25 on Sun April 27! Don't miss it! Oral at 2pm Poster at 2:35 - 3:35 pm

We are hiring a PhD research intern at FAIR w/ Mark Ibrahim Kamalika Chaudhuri to start this summer or Fall! Potential topics: trustworthy and reliable LLMs, multi-modal LLMs and agents, post-training, reasoning, with a focus on open science/sharing our findings in the paper at the end


Check out Danny's paper on a single-stage VLA recipe that trains fast, has fast inference, and follows language commands well. โก๏ธโก๏ธโก๏ธ The key: combine FAST tokens + flow-matching expert, and make sure those pesky diffusion gradients don't mess up your beautiful VLM backbone! :)