Boyi Li (@boyiliee) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Happy to share these exciting new results on video synthesis of humans in movement. Arguably, these establish the power of having explicit 3D representations. Popular video generation models like Sora don't do that, making it hard for the resulting video to be 4D consistent.

thumb_up_off_alt70

chat_bubble_outline0

repeat7

shareShare

Amir Bar

@_amirbar

6 months ago

Boyi Li Wow, this is super cool, Boyi! Such a flashback to Caroline 🐳 and Shiry Ginosar's 'Everybody Dance Now' 😀

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Marco Pavone

@drmapavone

6 months ago

Introducing DreamDrive, which combines the complementary strengths of generative AI (video diffusion) and neural reconstruction (Gaussian splatting) to transform any street-view image into a dynamic 4D driving scene! Web: pointscoder.github.io/DreamDrive/ Paper: arxiv.org/abs/2501.00601

thumb_up_off_alt216

chat_bubble_outline4

repeat44

shareShare

Jitendra MALIK

@jitendramalikcv

6 months ago

I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assistance of Toru. Lecture videos at youtube.com/playlist?list=… Lecture notes & other course materials at robots-that-learn.github.io

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat248

shareShare

Marco Pavone

@drmapavone

6 months ago

Complementing DreamDrive, I am thrilled to introduce STORM, which enables fast scene reconstruction with a single feed-forward model. STORM transforms camera logs into dynamic 3D models - in real time! Web: jiawei-yang.github.io/STORM/ Paper: arxiv.org/abs/2501.00602

thumb_up_off_alt126

chat_bubble_outline1

repeat32

shareShare

Boyi Li

@boyiliee

5 months ago

Our group at #NVIDIA has a few internship positions available. We welcome talented interns to join our efforts in autonomous driving and VLMs. If you're interested, please email me your CV.

thumb_up_off_alt384

chat_bubble_outline8

repeat28

shareShare

Jiageng Mao

@pointscoder

4 months ago

Can Vision-Language Models (VLMs) truly understand the physical world? 🌍🔬 Introducing PhysBench – the first benchmark to evaluate VLMs’ understanding of physics! PhysBench is accepted to #ICLR2025 as an Oral presentation (only 1.8% out of 11k submissions)! 🌐 Project:

thumb_up_off_alt413

chat_bubble_outline5

repeat73

shareShare

Boyi Li

@boyiliee

4 months ago

Nice to see the progress in interactive task planning. It reminds me of our previous work, ITP, which incorporates both high-level planning and low-level function execution via language. x.com/Boyiliee/statu…

thumb_up_off_alt35

chat_bubble_outline0

repeat1

shareShare

Marco Pavone

@drmapavone

4 months ago

For the first time ever, NVIDIA is hosting an AV Safety Day at GTC - a multi-session workshop on AV safety. We will share our latest work on safe AV platforms, run-time monitoring, safety data flywheels, and more! #AutonomousVehicles #AI at #GTC25 ➡️ nvda.ws/3Xc3xPo

thumb_up_off_alt32

chat_bubble_outline0

repeat14

shareShare

Boris Ivanovic

@iamborisi

3 months ago

Don’t miss this deep dive into the future of autonomous vehicles! Excited to present about how foundation models are transforming AV technology with Jose M. Alvarez at #GTC25! Check out all the session details below 👇

thumb_up_off_alt7

chat_bubble_outline0

repeat9

shareShare

David Chan

@_dmchan

3 months ago

🚀 New Paper Alert! 🚀 Introducing TULIP 🌷 – a multimodal framework for richer vision-language understanding! A drop-in replacement for CLIP-style models, TULIP learns fine-grained visual details while keeping strong language alignment. 🔗 tulip-berkeley.github.io 🧵👇

thumb_up_off_alt30

chat_bubble_outline2

repeat12

shareShare

Marco Pavone

@drmapavone

3 months ago

At #GTC2025, Jensen unveiled Halos, a comprehensive safety system for AVs and Physical AI. Halos integrates numerous technologies developed by my team NVIDIA, and I was thrilled to help coordinate its launch alongside Riccardo Mariani and many amazing colleagues NVIDIA DRIVE.

thumb_up_off_alt28

chat_bubble_outline1

repeat12

shareShare

Baifeng

@baifeng_shi

3 months ago

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

thumb_up_off_alt971

chat_bubble_outline27

repeat151

shareShare

Boyi Li

@boyiliee

3 months ago

4K Resolution! Vision is a critical part in building powerful multimodal foundation models. Super excited about this work.

thumb_up_off_alt47

chat_bubble_outline2

repeat2

shareShare

Boyi Li

@boyiliee

3 months ago

Hallucination is a big challenge in video understanding for any single model. To address this, we introduce Wolf 🐺 (wolfv0.github.io): a mixture-of-experts framework designed for accurate video understanding by distilling knowledge from various Vision-Language Models.

thumb_up_off_alt24

chat_bubble_outline1

repeat5

shareShare

Boyi Li

@boyiliee

3 months ago

👉🏻 We have released our code and benchmark data at github.com/NVlabs/Wolf. At #GTC 2025, we evaluated the safety and comfort of autonomous driving using Wolf: youtube.com/watch?v=_waPvO….

thumb_up_off_alt64

chat_bubble_outline1

repeat6

shareShare

AK

@_akhaliq

2 months ago

Nvidia just dropped Describe Anything on Hugging Face Detailed Localized Image and Video Captioning

thumb_up_off_alt905

chat_bubble_outline7

repeat154

shareShare

Yin Cui

@yincuicv

2 months ago

Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io

thumb_up_off_alt401

chat_bubble_outline6

repeat75

shareShare

Yichen Li

@antheayli

7 days ago

How to equip robot with super human sensory capabilities? Come join us at RSS 2025 workshop, June21, on Multimodal Robotics with Multisensory capabilities to learn more. Featuring speakers: Jitendra MALIK, Katherine J. Kuchenbecker, Kristen Grauman, Yunzhu Li, Boyi Li

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare