Xiaohan Zhang (@xiaohanzhang220) Twitter Tweets • TwiCopy

Chris Paxton

2 years ago

Excited that this dataset is finally released. Being able to answer difficult questions from long-horizon videos is a major challenge for embodied AI applications, and will be important for useful home robots.

thumb_up_off_alt31

chat_bubble_outline1

repeat8

shareShare

🇺🇦Olexandr Maksymets

@o_maksymets

2 years ago

Launching OpenEQA, our new benchmark for AI's understanding of physical environments. Despite AGI optimism, our tests with top VLMs reveal a significant gap to human-level comprehension. Let's bridge this gap in AI's world understanding.

thumb_up_off_alt26

chat_bubble_outline0

repeat7

shareShare

Mikael Henaff

@henaffmikael

2 years ago

Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented #CVPR2025.

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

Anurag Ajay

@aajay3110

2 years ago

Excited for the release of OpenEQA, a benchmark for embodied question answering that evaluates several abilities, including spatial reasoning, object localization, and world knowledge. Notably, state-of-the-art vision-language models such as GPT-4V have yet to succeed in this

thumb_up_off_alt8

chat_bubble_outline1

repeat4

shareShare

Jesse Thomason

@_jessethomason_

2 years ago

It's true, and the need for single-modality ablations during model building and dataset curation extends beyond classification tasks into actions too. A few years ago we found that many "embodied" agents end up either ignoring language OR ignoring vision. arxiv.org/abs/1811.00613

thumb_up_off_alt36

chat_bubble_outline3

repeat5

shareShare

Boston Dynamics

@bostondynamics

2 years ago

We promise this is not a person in a bodysuit. bit.ly/3w5XRvH

thumb_up_off_alt53,53K

chat_bubble_outline3,3K

repeat10,10K

shareShare

Anurag Ajay

@aajay3110

a year ago

I have been playing with GPT4o on a subset of the OpenEQA benchmark (Episodic Memory, HM3D scans), and it shows substantial improvement over GPT4V. In this subset, GPT4V scored 51.3 (out of 100), whereas a human agent scored 85.1. GPT4o scores 73.2, showing a much better

thumb_up_off_alt68

chat_bubble_outline5

repeat19

shareShare

Aravind Rajeswaran

@aravindr93

a year ago

Interested in knowing the difference between "world" models and "word" models? Come talk to us at the #OpenEQA poster today at #CVPR (Arch 4A-E Poster #179) x.com/AIatMeta/statu…

thumb_up_off_alt13

chat_bubble_outline0

repeat6

shareShare

Chris Paxton

@chris_j_paxton

a year ago

Our paper OpenEQA is being presented at CVPR! Check out some cool, realistic, human-annotated video question answering problems, see if you can beat GPT4 at them!

thumb_up_off_alt46

chat_bubble_outline0

repeat9

shareShare

Chris Paxton

@chris_j_paxton

a year ago

I'd like to introduce what I've been working on the last few months at Hello Robot: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration. The goal is to allow researchers and developers to quickly build

thumb_up_off_alt426

chat_bubble_outline23

repeat70

shareShare

Jiao Sun

@sunjiao123sun_

a year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Boston Dynamics

@bostondynamics

7 months ago

Atlas is demonstrating reinforcement learning policies developed using a motion capture suit. This demonstration was developed in partnership with Boston Dynamics and RAI Institute.

thumb_up_off_alt20,20K

chat_bubble_outline856

repeat4,4K

shareShare

RAI Institute

@rai_inst

6 months ago

New tricks loading ...

thumb_up_off_alt865

chat_bubble_outline12

repeat163

shareShare

Manling Li

@manlingli_

5 months ago

Today is the day! Welcome to join #CVPR2025 workshop on Foundation Models meet Embodied Agents! 🗓️Jun 11 📍Room 214 🌐…models-meet-embodied-agents.github.io/cvpr2025/ Looking forward to learning insights from wonderful speakers Jitendra MALIK Ranjay Krishna Katerina Fragkiadaki Shuang Li Yilun Du

Today is the day! Welcome to join <a href="/CVPR/">#CVPR2025</a> workshop on Foundation Models meet Embodied Agents!

🗓️Jun 11
📍Room 214
🌐…models-meet-embodied-agents.github.io/cvpr2025/

Looking forward to learning insights from wonderful speakers <a href="/JitendraMalikCV/">Jitendra MALIK</a> <a href="/RanjayKrishna/">Ranjay Krishna</a> <a href="/KaterinaFragiad/">Katerina Fragkiadaki</a> <a href="/ShuangL13799063/">Shuang Li</a> <a href="/du_yilun/">Yilun Du</a>

thumb_up_off_alt69

chat_bubble_outline1

repeat17

shareShare

Xiaohan Zhang

@xiaohanzhang220

4 months ago

I’m looking for a PhD research intern to work on robot foundation models RAI Institute (formerly known as Boston Dynamics AI Institute). If you have experience with imitation learning, simulation, and real robots, please feel free to DM me or apply here: jobs.lever.co/rai/f2169567-0…

thumb_up_off_alt128

chat_bubble_outline4

repeat23

shareShare

Xiaohan Zhang

@xiaohanzhang220

3 months ago

A world model for forecasting multi-object interactions. Suning did a great job demonstrating its effectiveness on complicated bi-manual tasks.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Mac Schwager

@macschwager

3 months ago

Do world models for robots really need to predict RGB videos? Probably not! Our ParticleFormer shows that 3D structure, semantics, and fine grained action conditioning all seem to be more important than RGB appearance. Congrats to Suning and an amazing team on this work!

thumb_up_off_alt100

chat_bubble_outline1

repeat14

shareShare

RAI Institute

@rai_inst

2 months ago

Using reinforcement learning we have expanded the range of techniques the Ultra Mobile Vehicle (UMV) uses to handle terrain and obstacles, including hops, out-of-plane balance, and level-ground flips. Millions of physics-based simulations provide training data to support

thumb_up_off_alt782

chat_bubble_outline23

repeat184

shareShare