Danny Driess (@dannydriess) 's Twitter Profile
Danny Driess

@dannydriess

Research Scientist @GoogleDeepMind

ID: 1425030305501663282

linkhttp://dannydriess.github.io calendar_today10-08-2021 09:45:00

109 Tweet

2,2K Followers

296 Following

Fei Xia (@xf1280) 's Twitter Profile Photo

๐Ÿค–Excited to share our project where we propose to use rewards represented in code as a flexible interface between LLMs and an optimization-based motion controller. website: language-to-reward.github.io Want to learn more about how we make a robot dog do moonwalk MJ style?๐Ÿ•บ๐Ÿ•บ

Joan Puigcerver (@joapuipe) 's Twitter Profile Photo

Introducing Soft MoE! Sparse MoEs are a popular method for increasing the model size without increasing its cost, but they come with several issues. Soft MoEs avoid them and significantly outperform ViT and different Sparse MoEs on image classification. arxiv.org/abs/2308.00951

Tao Tu (@taotu831) 's Twitter Profile Photo

Excited to push the forefront of multimodal LLMs for Medicine! We previewed an ambitious generalist approach with Med-PaLM M last week as the first demonstration of a generalist biomedical AI system that flexibly encodes and integrates multimodal biomedical data.

Keerthana Gopalakrishnan (@keerthanpg) 's Twitter Profile Photo

Not many people know this but RT2 and our robots, were on the front page of the The New York Times! Just got my personal copy. Go read the paper if you havenโ€™t yet: robotics-transformer2.github.io/assets/rt2.pdf

Not many people know this but RT2 and our robots, were on the front page of the <a href="/nytimes/">The New York Times</a>! Just got my personal copy. 

Go read the paper if you havenโ€™t yet: robotics-transformer2.github.io/assets/rt2.pdf
Jeannette Bohg (@leto__jean) 's Twitter Profile Photo

Recognizing symbols like "dish in dishwasher" or "cup on table" enables ๐Ÿค– task planning But how do we get data to train models for recognizing symbols? Introducing "Grounding Predicates through Actions" to automatically label human video datasets ๐Ÿงต sites.google.com/stanford.edu/gโ€ฆ

Fei Xia (@xf1280) 's Twitter Profile Photo

With Q-Transformer out, we can visualize the progress of our team consolidating robotics models. Starting from SayCan, we need separate models to plan, predict affordance, and act. We arrived at RT-2, where a single model can do it all, in the spirit of foundation models.

With Q-Transformer out, we can visualize the progress of our team consolidating robotics models. Starting from SayCan, we need separate models to plan, predict affordance, and act. We arrived at RT-2, where a single model can do it all, in the spirit of foundation models.
Fei Xia (@xf1280) 's Twitter Profile Photo

Note we don't consolidate just for the sake of having fewer models. One benefit from consolidation is generalization. Models trained on a mixture of tasks out performs single task model. PaLM-E is a good example. x.com/DannyDriess/stโ€ฆ

Sergey Levine (@svlevine) 's Twitter Profile Photo

RT-1 and RT-2 showed the power of large Transformer policies, but relied on imitation learning. Can we trained them with RL? Q-Transformer is "RT + RL" -- a robot Transformer trained with offline RL! Based on CQL, Q-T enables robot Transformers to use suboptimal data. ๐Ÿงต๐Ÿ‘‡

Pete Florence (@peteflorence) 's Twitter Profile Photo

Introduction by way of a massively oversimplified Haiku -- VLM problem: Suck at 3D reasoning Generate data :) Actually getting this done, at scale, comes with a very creative pipeline, and lots of analysis. Awesome work lead by Boyuan Chen and amazing hosting by Fei Xia !

Fei Xia (@xf1280) 's Twitter Profile Photo

We propose PIVOT: Iterative Visual Prompting, a new visual prompting method to elicit knowledge about action / motion / spatial understanding in VLMs. For example, we can ask "How should I make a smiley face with the fruits" and our method will generate arrows to guide you.

Danny Driess (@dannydriess) 's Twitter Profile Photo

Checkout PIVOT pivot-prompt.github.io where we show that by visually prompting VLMs with image annotations we can achieve zero-shot robot control!

Checkout PIVOT pivot-prompt.github.io where we show that by visually prompting VLMs with image annotations we can achieve zero-shot robot control!
Danny Driess (@dannydriess) 's Twitter Profile Photo

One exciting aspect of our ๐€๐‹๐Ž๐‡๐€ ๐”๐ง๐ฅ๐ž๐š๐ฌ๐ก๐ž๐ ๐ŸŒ‹ policies is that they are very persistent. In this video, you can see the ALOHA robots to reorient the shirt until the task can finally be solved. Make sure to checkout Tony Z. Zhao's and Ayzaan Wahid's threads for more videos

Danny Driess (@dannydriess) 's Twitter Profile Photo

Checkout Ayzaan Wahid's thread for more ๐€๐‹๐Ž๐‡๐€ ๐”๐ง๐ฅ๐ž๐š๐ฌ๐ก๐ž๐ ๐ŸŒ‹ videos. Google DeepMind

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Meet our AI-powered robot thatโ€™s ready to play table tennis. ๐Ÿค–๐Ÿ“ Itโ€™s the first agent to achieve amateur human level performance in this sport. Hereโ€™s how it works. ๐Ÿงต