Xiaohan Zhang (@xiaohanzhang220) 's Twitter Profile
Xiaohan Zhang

@xiaohanzhang220

Robotics Researcher at Boston Dynamics AI Institute

ID: 1039727416749580292

linkhttps://keke-220.github.io/ calendar_today12-09-2018 04:08:28

78 Tweet

795 Takipçi

473 Takip Edilen

Chris Paxton (@chris_j_paxton) 's Twitter Profile Photo

Excited that this dataset is finally released. Being able to answer difficult questions from long-horizon videos is a major challenge for embodied AI applications, and will be important for useful home robots.

🇺🇦Olexandr Maksymets (@o_maksymets) 's Twitter Profile Photo

Launching OpenEQA, our new benchmark for AI's understanding of physical environments. Despite AGI optimism, our tests with top VLMs reveal a significant gap to human-level comprehension. Let's bridge this gap in AI's world understanding.

Mikael Henaff (@henaffmikael) 's Twitter Profile Photo

Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented #CVPR2025.

Anurag Ajay (@aajay3110) 's Twitter Profile Photo

Excited for the release of OpenEQA, a benchmark for embodied question answering that evaluates several abilities, including spatial reasoning, object localization, and world knowledge. Notably, state-of-the-art vision-language models such as GPT-4V have yet to succeed in this

Jesse Thomason (@_jessethomason_) 's Twitter Profile Photo

It's true, and the need for single-modality ablations during model building and dataset curation extends beyond classification tasks into actions too. A few years ago we found that many "embodied" agents end up either ignoring language OR ignoring vision. arxiv.org/abs/1811.00613

Anurag Ajay (@aajay3110) 's Twitter Profile Photo

I have been playing with GPT4o on a subset of the OpenEQA benchmark (Episodic Memory, HM3D scans), and it shows substantial improvement over GPT4V. In this subset, GPT4V scored 51.3 (out of 100), whereas a human agent scored 85.1. GPT4o scores 73.2, showing a much better

Aravind Rajeswaran (@aravindr93) 's Twitter Profile Photo

Interested in knowing the difference between "world" models and "word" models? Come talk to us at the #OpenEQA poster today at #CVPR (Arch 4A-E Poster #179) x.com/AIatMeta/statu…

Chris Paxton (@chris_j_paxton) 's Twitter Profile Photo

Our paper OpenEQA is being presented at CVPR! Check out some cool, realistic, human-annotated video question answering problems, see if you can beat GPT4 at them!

Chris Paxton (@chris_j_paxton) 's Twitter Profile Photo

I'd like to introduce what I've been working on the last few months at Hello Robot: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration. The goal is to allow researchers and developers to quickly build

Jiao Sun (@sunjiao123sun_) 's Twitter Profile Photo

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans! 

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a> 

We have ethical reviews for authors, but missed it for invited speakers? 😡
Boston Dynamics (@bostondynamics) 's Twitter Profile Photo

Atlas is demonstrating reinforcement learning policies developed using a motion capture suit. This demonstration was developed in partnership with Boston Dynamics and RAI Institute.

Manling Li (@manlingli_) 's Twitter Profile Photo

Today is the day! Welcome to join #CVPR2025 workshop on Foundation Models meet Embodied Agents! 🗓️Jun 11 📍Room 214 🌐…models-meet-embodied-agents.github.io/cvpr2025/ Looking forward to learning insights from wonderful speakers Jitendra MALIK Ranjay Krishna Katerina Fragkiadaki Shuang Li Yilun Du

Today is the day! Welcome to join <a href="/CVPR/">#CVPR2025</a> workshop on Foundation Models meet Embodied Agents!

🗓️Jun 11
📍Room 214
🌐…models-meet-embodied-agents.github.io/cvpr2025/

Looking forward to learning insights from wonderful speakers <a href="/JitendraMalikCV/">Jitendra MALIK</a> <a href="/RanjayKrishna/">Ranjay Krishna</a> <a href="/KaterinaFragiad/">Katerina Fragkiadaki</a> <a href="/ShuangL13799063/">Shuang Li</a> <a href="/du_yilun/">Yilun Du</a>
Xiaohan Zhang (@xiaohanzhang220) 's Twitter Profile Photo

I’m looking for a PhD research intern to work on robot foundation models RAI Institute (formerly known as Boston Dynamics AI Institute). If you have experience with imitation learning, simulation, and real robots, please feel free to DM me or apply here: jobs.lever.co/rai/f2169567-0…

Xiaohan Zhang (@xiaohanzhang220) 's Twitter Profile Photo

A world model for forecasting multi-object interactions. Suning did a great job demonstrating its effectiveness on complicated bi-manual tasks.

Mac Schwager (@macschwager) 's Twitter Profile Photo

Do world models for robots really need to predict RGB videos? Probably not! Our ParticleFormer shows that 3D structure, semantics, and fine grained action conditioning all seem to be more important than RGB appearance. Congrats to Suning and an amazing team on this work!

RAI Institute (@rai_inst) 's Twitter Profile Photo

Using reinforcement learning we have expanded the range of techniques the Ultra Mobile Vehicle (UMV) uses to handle terrain and obstacles, including hops, out-of-plane balance, and level-ground flips. Millions of physics-based simulations provide training data to support