Zhuo Xu (@drzhuoxu) 's Twitter Profile
Zhuo Xu

@drzhuoxu

Research Scientist @GoogleDeepMind, PhD @Berkeley, previously @Tsinghua_Uni.

ID: 1726882175386345472

linkhttps://drzhuoxu.github.io/ calendar_today21-11-2023 08:36:24

24 Tweet

137 Followers

85 Following

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

2️⃣ The RT-Trajectory model learns how to follow instructions by automatically adding visual outlines that describe robot motions to its training. It takes videos in a dataset and overlays it with a 2D trajectory sketch of the robot arm’s gripper as it performs the task.

Zhuo Xu (@drzhuoxu) 's Twitter Profile Photo

Humans are capable of vision based 3D spatial estimation, why shouldn't VLMs be able to do the same? We believe VLM architectures are capable enough but just lack sufficient training data, and supply such data via careful synthesis from 2D web images. Check out Boyuan Chen post

Humans are capable of vision based 3D spatial  estimation, why shouldn't VLMs be able to do the same? We believe VLM architectures are capable enough but just lack sufficient training data, and supply such data via careful synthesis from 2D web images. Check out <a href="/BoyuanChen0/">Boyuan Chen</a> post
lmsys.org (@lmsysorg) 's Twitter Profile Photo

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini

🔥Breaking News from Arena

Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to <a href="/Google/">Google</a> for the remarkable achievement!

The race is heating up like never before! Super excited to see what's next for Bard + Gemini
AK (@_akhaliq) 's Twitter Profile Photo

Google Deepmind presents Generative Expressive Robot Behaviors using Large Language Models paper page: huggingface.co/papers/2401.14… People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person

Google Deepmind presents Generative Expressive Robot Behaviors using Large Language Models

paper page: huggingface.co/papers/2401.14…

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person
Zhuo Xu (@drzhuoxu) 's Twitter Profile Photo

Having worked on door opening at Everyday Robots, I know so well how challenging the door opening task is, great work on using VLM for generating feedback signals for tackling the challenge!

Wenhao Yu (@stacormed) 's Twitter Profile Photo

“A picture is worth a thousand words”, can VLMs also read robot actions better in images than in words? We introduce PIVOT to explore this idea and enable a VLM to zero-shot “find a place to sit down and do writing” by navigating a robot to the room with the light on :)

Zhuo Xu (@drzhuoxu) 's Twitter Profile Photo

Our interesting findings from exploring the sampling based planning in the era of large VLMs — pivot-prompt.github.io

Jeff Dean (@🏡) (@jeffdean) 's Twitter Profile Photo

Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long

Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length

Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model.  One of the key differentiators of this model is its incredibly long
Cheng Chi (@chichengcc) 's Twitter Profile Photo

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from Stanford University designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

Toru (@toruo_o) 's Twitter Profile Photo

Achieving bimanual dexterity with RL + Sim2Real! toruowo.github.io/bimanual-twist/ TLDR - We train two robot hands to twist bottle lids using deep RL followed by sim-to-real. A single policy trained with simple simulated bottles can generalize to drastically different real-world objects.

Tony Z. Zhao (@tonyzzhao) 's Twitter Profile Photo

Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI. Google DeepMind Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!

Ayzaan Wahid (@ayzwah) 's Twitter Profile Photo

For the past year we've been working on ALOHA Unleashed 🌋 @GoogleDeepmind - pushing the scale and dexterity of tasks on our ALOHA 2 fleet. Here is a thread with some of the coolest videos! The first task is hanging a shirt on a hanger (autonomous 1x)

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye." So here's some of the many interesting ablations we did, check the paper tomorrow for more! 🧶

✨PaliGemma report will hit arxiv tonight.

We tried hard to make it interesting, and not "here model. sota results. kthxbye."

So here's some of the many interesting ablations we did, check the paper tomorrow for more!

🧶
Zipeng Fu (@zipengfu) 's Twitter Profile Photo

Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project: - Gemini 1.5 Pro for high-level image & text understanding - topological graphs for low-level navigation - supports multimodal instructions co-lead Zhuo Xu, Lewis Chiang, Jie Tan

lmsys.org (@lmsysorg) 's Twitter Profile Photo

Exciting News from Chatbot Arena! Google DeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

Exciting News from Chatbot Arena!

<a href="/GoogleDeepMind/">Google DeepMind</a>'s new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive
Demis Hassabis (@demishassabis) 's Twitter Profile Photo

Never seen a competitive leaderboard that I didn't like 😀 Congrats to the Gemini team on ranking no.1 🏆 with our latest improved Gemini 1.5 Pro developer preview model, which you can try on AI studio now!