Boyi Li (@boyiliee) 's Twitter Profile
Boyi Li

@boyiliee

ID: 1236009554737082369

linkhttp://sites.google.com/site/boyilics/home calendar_today06-03-2020 19:24:18

124 Tweet

2,2K Takipçi

323 Takip Edilen

Jitendra MALIK (@jitendramalikcv) 's Twitter Profile Photo

Happy to share these exciting new results on video synthesis of humans in movement. Arguably, these establish the power of having explicit 3D representations. Popular video generation models like Sora don't do that, making it hard for the resulting video to be 4D consistent.

Marco Pavone (@drmapavone) 's Twitter Profile Photo

Introducing DreamDrive, which combines the complementary strengths of generative AI (video diffusion) and neural reconstruction (Gaussian splatting) to transform any street-view image into a dynamic 4D driving scene! Web: pointscoder.github.io/DreamDrive/ Paper: arxiv.org/abs/2501.00601

Introducing DreamDrive, which combines the complementary strengths of generative AI (video diffusion) and neural reconstruction (Gaussian splatting) to transform any street-view image into a dynamic 4D driving scene!

Web: pointscoder.github.io/DreamDrive/
Paper: arxiv.org/abs/2501.00601
Jitendra MALIK (@jitendramalikcv) 's Twitter Profile Photo

I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assistance of Toru. Lecture videos at youtube.com/playlist?list=… Lecture notes & other course materials at robots-that-learn.github.io

Marco Pavone (@drmapavone) 's Twitter Profile Photo

Complementing DreamDrive, I am thrilled to introduce STORM, which enables fast scene reconstruction with a single feed-forward model. STORM transforms camera logs into dynamic 3D models - in real time! Web: jiawei-yang.github.io/STORM/ Paper: arxiv.org/abs/2501.00602

Complementing DreamDrive, I am thrilled to introduce STORM, which enables fast scene reconstruction with a single feed-forward model.

STORM transforms camera logs into dynamic 3D models - in real time!

Web: jiawei-yang.github.io/STORM/
Paper: arxiv.org/abs/2501.00602
Boyi Li (@boyiliee) 's Twitter Profile Photo

Our group at #NVIDIA has a few internship positions available. We welcome talented interns to join our efforts in autonomous driving and VLMs. If you're interested, please email me your CV.

Jiageng Mao (@pointscoder) 's Twitter Profile Photo

Can Vision-Language Models (VLMs) truly understand the physical world? 🌍🔬 Introducing PhysBench – the first benchmark to evaluate VLMs’ understanding of physics! PhysBench is accepted to #ICLR2025 as an Oral presentation (only 1.8% out of 11k submissions)! 🌐 Project:

Boyi Li (@boyiliee) 's Twitter Profile Photo

Nice to see the progress in interactive task planning. It reminds me of our previous work, ITP, which incorporates both high-level planning and low-level function execution via language. x.com/Boyiliee/statu…

Marco Pavone (@drmapavone) 's Twitter Profile Photo

For the first time ever, NVIDIA is hosting an AV Safety Day at GTC - a multi-session workshop on AV safety. We will share our latest work on safe AV platforms, run-time monitoring, safety data flywheels, and more! #AutonomousVehicles #AI at #GTC25 ➡️ nvda.ws/3Xc3xPo

Boris Ivanovic (@iamborisi) 's Twitter Profile Photo

Don’t miss this deep dive into the future of autonomous vehicles! Excited to present about how foundation models are transforming AV technology with Jose M. Alvarez at #GTC25! Check out all the session details below 👇

David Chan (@_dmchan) 's Twitter Profile Photo

🚀 New Paper Alert! 🚀 Introducing TULIP 🌷 – a multimodal framework for richer vision-language understanding! A drop-in replacement for CLIP-style models, TULIP learns fine-grained visual details while keeping strong language alignment. 🔗 tulip-berkeley.github.io 🧵👇

Marco Pavone (@drmapavone) 's Twitter Profile Photo

At #GTC2025, Jensen unveiled Halos, a comprehensive safety system for AVs and Physical AI. Halos integrates numerous technologies developed by my team NVIDIA, and I was thrilled to help coordinate its launch alongside Riccardo Mariani and many amazing colleagues NVIDIA DRIVE.

Baifeng (@baifeng_shi) 's Twitter Profile Photo

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

Next-gen vision pre-trained models shouldn’t be short-sighted.

Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage.

Today, we
Boyi Li (@boyiliee) 's Twitter Profile Photo

4K Resolution! Vision is a critical part in building powerful multimodal foundation models. Super excited about this work.

Boyi Li (@boyiliee) 's Twitter Profile Photo

Hallucination is a big challenge in video understanding for any single model. To address this, we introduce Wolf 🐺 (wolfv0.github.io): a mixture-of-experts framework designed for accurate video understanding by distilling knowledge from various Vision-Language Models.

Boyi Li (@boyiliee) 's Twitter Profile Photo

👉🏻 We have released our code and benchmark data at github.com/NVlabs/Wolf. At #GTC 2025, we evaluated the safety and comfort of autonomous driving using Wolf: youtube.com/watch?v=_waPvO….

Yin Cui (@yincuicv) 's Twitter Profile Photo

Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io

Yichen Li (@antheayli) 's Twitter Profile Photo

How to equip robot with super human sensory capabilities? Come join us at RSS 2025 workshop, June21, on Multimodal Robotics with Multisensory capabilities to learn more. Featuring speakers: Jitendra MALIK, Katherine J. Kuchenbecker, Kristen Grauman, Yunzhu Li, Boyi Li