Qingqing Zhao
@qingqing_zhao_
PhD candidate at Stanford
ID: 950583149846671361
https://qingqing-zhao.github.io/ 09-01-2018 04:20:58
72 Tweet
1,1K Followers
639 Following
β¨ Introducing ππ©ππ§πππ β an open-source vision-language-action model for robotics! π - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals π¦Ύ - trained on 970k episodes from OpenX dataset π€ - fully open: model/code/data all online π€ π§΅π
Cambrian-1 πͺΌ Through a vision-centric lens, we study every aspect of building Multimodal LLMs except the LLMs themselves. As a byproduct, we achieve superior performance at the 8B, 13B, 34B scales. πarxiv.org/abs/2406.16860 πcambrian-mllm.github.io π€huggingface.co/nyu-visionx
Introduce Open-πππ₯πππ’π¬π’π¨π§π€: β£ We need an intuitive and remote teleoperation interface to collect more robot data. πππ₯πππ’π¬π’π¨π§ lets you immersively operate a robot even if you are 3000 miles away, like in the movie ππ·π’π΅π’π³. Open-sourced!
Want to #WalkTheDog in the metaverse? In our project at #SIGRAPH2024 with Sebastian Starke , Yuting Ye and Olga Sorkine-Hornung, we develop an approach to learn a common 1D phase manifold from motion datasets across different morphologies, *without* any supervision (1/2)
My first SIGGRAPH at #SIGGRAPH2024 ! Chen-Hsuan Lin Jiashu Xu Donglai Xiang will show 3D scene generation in real time from scratch along with other 10 RTL participants. Join us at 6p today!
Introducing OFTβan Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference β‘οΈ -SOTA 97.1% avg SR in LIBERO πͺ -high-freq control w/ 7B model on real bimanual robot -outperforms Οβ, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT π§΅π
Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "juice shops became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More atβboyangdeng.com/visual-chronicβ¦ (vid β w/ π)