Karl Pertsch (@karlpertsch) Twitter Tweets • TwiCopy

Karl Pertsch

@karlpertsch

+ Follow

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

ID: 3377714115

linkhttp://kpertsch.github.io calendar_today15-07-2015 19:46:33

353 Tweet

3,3K Followers

269 Following

Remi Cadene

@remicadene

9 months ago

⭐ The first foundational model available on LeRobot ⭐ Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior. It was trained by Physical Intelligence and ported to pytorch by Pablo Montalvo 👇🧵

thumb_up_off_alt918

chat_bubble_outline28

repeat174

shareShare

Jie Wang

@jiewang_zjui

9 months ago

Excited to play with Pi0, it is so cool! We just config DROID, download ckpt & inference code, and it works without any tuning, what a impressive moment

thumb_up_off_alt33

chat_bubble_outline2

repeat4

shareShare

Edward Hu

@edward_s_hu

9 months ago

Pi0 really did work for us on the first try. No camera calibration, controller tuning, etc. The failure cases: missed grasps and risk-averse "hedging" behavior. Excited to see how the robotics community improves on this. At the very least, it will be a good baseline.

thumb_up_off_alt210

chat_bubble_outline6

repeat17

shareShare

Karl Pertsch

@karlpertsch

8 months ago

Check out Lucy's project on teaching our robots to be more steerable and interpret instructions like "that's not trash" in the context of the current scene! Big kudos to Lucy Shi who wrangled the full robot learning stack from LL training, VLM/VLA training to human annotation!

thumb_up_off_alt47

chat_bubble_outline1

repeat2

shareShare

Karl Pertsch

@karlpertsch

8 months ago

One step closer to voice mode with embodiment -- very cool! Congrats to Dorsa Sadigh and the Google robotics team! :)

thumb_up_off_alt31

chat_bubble_outline0

repeat2

shareShare

Karl Pertsch

@karlpertsch

7 months ago

Scalable evaluation is a major challenge in robotics research! Check out our AutoEval project, where we try to make reproducible eval more accessible through 24/7 autonomous policy evaluation. Our eval cells are public, so you can submit your Bridge policies for eval today! :)

thumb_up_off_alt49

chat_bubble_outline1

repeat3

shareShare

Irmak Guzey

@irmakkguzey

6 months ago

Despite great advances in learning dexterity, hardware remains a major bottleneck. Most dexterous hands are either bulky, weak or expensive. I’m thrilled to present the RUKA Hand — a powerful, accessible research tool for dexterous manipulation that overcomes these limitations!

thumb_up_off_alt429

chat_bubble_outline16

repeat100

shareShare

Karl Pertsch

@karlpertsch

6 months ago

Our VLA policies now generalize to new homes! 🏠🏠🏠 The main takeaway of π-0.5 is that with good tokenization + flex. VLA architecture you can get away with relatively little mobile manip data (~400h) and still get policies that generalize to cleaning unseen kitchens & bedrooms!

thumb_up_off_alt121

chat_bubble_outline3

repeat2

shareShare

Karl Pertsch

@karlpertsch

6 months ago

Training with discrete FAST action tokenization now powers all of our pre-training in π-0.5! When combined with π-0 style flow matching during post-training we get both, fast training & fast inference :)

thumb_up_off_alt30

chat_bubble_outline0

repeat2

shareShare

Karl Pertsch

@karlpertsch

6 months ago

Check out Zubair & Vitor's work on improved camera calibration for DROID! 36k diverse episodes with high-quality calibration + multi-view + stereo, should be a great resource for anyone working on 3D vision / spatial understanding for robotics!

thumb_up_off_alt34

chat_bubble_outline0

repeat1

shareShare

Paul Zhou

@zhiyuan_zhou_

6 months ago

How can we make robot policy evaluation easier, more accessible, and more comparable? Our answer: autonomous 24/7 eval in real AutoEval will be presented by Sergey Levine at the Robot Learning Workshop at #ICLR25 on Sun April 27! Don't miss it! Oral at 2pm Poster at 2:35 - 3:35 pm

thumb_up_off_alt31

chat_bubble_outline2

repeat4

shareShare

Polina Kirichenko

@polkirichenko

6 months ago

We are hiring a PhD research intern at FAIR w/ Mark Ibrahim Kamalika Chaudhuri to start this summer or Fall! Potential topics: trustworthy and reliable LLMs, multi-modal LLMs and agents, post-training, reasoning, with a focus on open science/sharing our findings in the paper at the end

thumb_up_off_alt403

chat_bubble_outline4

repeat36

shareShare

Paul Zhou

@zhiyuan_zhou_

6 months ago

Yes! Lets build a network of distributed eval stations together 🦾 With our open sourced framework it now only takes 3-5 hours to set up a new AutoEval station! We have released a detailed step by step guide.

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Karl Pertsch

@karlpertsch

5 months ago

Check out Danny's paper on a single-stage VLA recipe that trains fast, has fast inference, and follows language commands well. ⚡️⚡️⚡️ The key: combine FAST tokens + flow-matching expert, and make sure those pesky diffusion gradients don't mess up your beautiful VLM backbone! :)

thumb_up_off_alt34

chat_bubble_outline0

repeat2

shareShare