Zhuo Xu (@drzhuoxu) Twitter Tweets • TwiCopy

Google DeepMind

a year ago

2️⃣ The RT-Trajectory model learns how to follow instructions by automatically adding visual outlines that describe robot motions to its training. It takes videos in a dataset and overlays it with a 2D trajectory sketch of the robot arm’s gripper as it performs the task.

thumb_up_off_alt99

chat_bubble_outline2

repeat16

shareShare

Zhuo Xu

@drzhuoxu

a year ago

Humans are capable of vision based 3D spatial estimation, why shouldn't VLMs be able to do the same? We believe VLM architectures are capable enough but just lack sufficient training data, and supply such data via careful synthesis from 2D web images. Check out Boyuan Chen post

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Sundar Pichai

@sundarpichai

a year ago

Exciting to see this new partnership with Hugging Face to support developers building new AI applications.

thumb_up_off_alt323

chat_bubble_outline47

repeat59

shareShare

lmsys.org

@lmsysorg

a year ago

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini

🔥Breaking News from Arena

Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to <a href="/Google/">Google</a> for the remarkable achievement!

The race is heating up like never before! Super excited to see what's next for Bard + Gemini

thumb_up_off_alt2,2K

chat_bubble_outline154

repeat623

shareShare

AK

@_akhaliq

a year ago

Google Deepmind presents Generative Expressive Robot Behaviors using Large Language Models paper page: huggingface.co/papers/2401.14… People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person

thumb_up_off_alt202

chat_bubble_outline1

repeat48

shareShare

Zhuo Xu

@drzhuoxu

a year ago

Having worked on door opening at Everyday Robots, I know so well how challenging the door opening task is, great work on using VLM for generating feedback signals for tackling the challenge!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wenhao Yu

@stacormed

a year ago

“A picture is worth a thousand words”, can VLMs also read robot actions better in images than in words? We introduce PIVOT to explore this idea and enable a VLM to zero-shot “find a place to sit down and do writing” by navigating a robot to the room with the light on :)

thumb_up_off_alt48

chat_bubble_outline1

repeat6

shareShare

Zhuo Xu

@drzhuoxu

a year ago

Our interesting findings from exploring the sampling based planning in the era of large VLMs — pivot-prompt.github.io

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jeff Dean (@🏡)

@jeffdean

a year ago

Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long

thumb_up_off_alt6,6K

chat_bubble_outline195

repeat1,1K

shareShare

Cheng Chi

@chichengcc

a year ago

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from Stanford University designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat353

shareShare

Toru

@toruo_o

10 months ago

Achieving bimanual dexterity with RL + Sim2Real! toruowo.github.io/bimanual-twist/ TLDR - We train two robot hands to twist bottle lids using deep RL followed by sim-to-real. A single policy trained with simple simulated bottles can generalize to drastically different real-world objects.

thumb_up_off_alt217

chat_bubble_outline5

repeat59

shareShare

Tony Z. Zhao

@tonyzzhao

9 months ago

Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI. Google DeepMind Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat343

shareShare

Ayzaan Wahid

@ayzwah

9 months ago

For the past year we've been working on ALOHA Unleashed 🌋 @GoogleDeepmind - pushing the scale and dexterity of tasks on our ALOHA 2 fleet. Here is a thread with some of the coolest videos! The first task is hanging a shirt on a hanger (autonomous 1x)

thumb_up_off_alt549

chat_bubble_outline32

repeat116

shareShare

Lucas Beyer (bl16)

@giffmana

6 months ago

✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye." So here's some of the many interesting ablations we did, check the paper tomorrow for more! 🧶

thumb_up_off_alt858

chat_bubble_outline20

repeat117

shareShare

Zipeng Fu

@zipengfu

6 months ago

Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project: - Gemini 1.5 Pro for high-level image & text understanding - topological graphs for low-level navigation - supports multimodal instructions co-lead Zhuo Xu, Lewis Chiang, Jie Tan

thumb_up_off_alt173

chat_bubble_outline3

repeat26

shareShare

Zhuo Xu

@drzhuoxu

6 months ago

Today's long context, multimodal models are very good at solving long horizon robotics tasks -- such as navigation.

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

lmsys.org

@lmsysorg

5 months ago

Exciting News from Chatbot Arena! Google DeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

Exciting News from Chatbot Arena!

<a href="/GoogleDeepMind/">Google DeepMind</a>'s new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive

thumb_up_off_alt1,1K

chat_bubble_outline84

repeat427

shareShare

Demis Hassabis

@demishassabis

5 months ago

Never seen a competitive leaderboard that I didn't like 😀 Congrats to the Gemini team on ranking no.1 🏆 with our latest improved Gemini 1.5 Pro developer preview model, which you can try on AI studio now!

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat118

shareShare