Shiyao Xu (@xusy2333) Twitter Tweets • TwiCopy

Yunzhi Zhang

10 months ago

Get a peek into some (surprisingly accurate) 3D scenes generated via the Scene Language—our new scene representation! Now powered by the new Claude 3.5 Sonnet. More examples in our released repo: github.com/zzyunzhi/scene…. Why is the representation effective? (1/3)

thumb_up_off_alt124

chat_bubble_outline4

repeat18

shareShare

David Acuna

@davidjesusacu

10 months ago

Can large VLMs reason about object sizes and distances? 🤔 We dive into this with a fresh test set in our new work. While out-of-the-box most SoTA models struggle, instructing them to use reference objects in their reasoning paths notably improves performance! #EMNLP2024 👇

thumb_up_off_alt21

chat_bubble_outline0

repeat5

shareShare

Kevin Chih-Yao Ma

@chihyaoma

10 months ago

The true strength of foundation models often lies in the encoder itself. NVIDIA's new Cosmos Tokenizer looks pretty interesting … Website: research.nvidia.com/labs/dir/cosmo… GitHub: github.com/NVIDIA/Cosmos-…

thumb_up_off_alt194

chat_bubble_outline0

repeat36

shareShare

Shiyao Xu

@xusy2333

9 months ago

same🥲

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shiyao Xu

@xusy2333

8 months ago

Hahahaha so true!!!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shiyao Xu

@xusy2333

7 months ago

woooooow!!!! look at this!!!!🙊look what i found!!!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wentao Zhu

@walterzhu8

7 months ago

Learn how we’re redefining evaluation of human motion generation at #ICLR2025! 💃 We propose MotionCritic to: ✅Evaluate motion quality like humans 🚀Enhance motion generators with minimal fine-tuning 🔍Even help uncover artifacts in GT motion datasets! 👉motioncritic.github.io

thumb_up_off_alt66

chat_bubble_outline2

repeat18

shareShare

kelly 🌸

@lychkel

7 months ago

DO IT OR ELSE

thumb_up_off_alt155,155K

chat_bubble_outline185

repeat21,21K

shareShare

Shiyao Xu

@xusy2333

6 months ago

POV: you installed pytorch3d while some experiments still running, then you broke your environment😇🥲

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shiyao Xu

@xusy2333

6 months ago

👀💃🏻🕺🏻

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jun Gao

@jungao33210520

6 months ago

The fundamental reason why our real-world video is consistent and we can control the camera when capturing it is because real-world video is the 2D projection of the underlying 3D world. Here, we bring this insight into video generation for precise camera control and consistency!

thumb_up_off_alt84

chat_bubble_outline1

repeat14

shareShare

Shiyao Xu

@xusy2333

5 months ago

really cool…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shiyao Xu

@xusy2333

4 months ago

👀come and join us…?!🫵🏻💃🏻🕺🏻👯‍♀️👯‍♂️

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Shiyao Xu

@xusy2333

4 months ago

Does anyone know why my custom llama3.1-instruct generates something repeated like "Cutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024"?🥲

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jon Barron

@jon_barron

4 months ago

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

thumb_up_off_alt657

chat_bubble_outline34

repeat86

shareShare

Jitendra MALIK

@jitendramalikcv

4 months ago

Angjoo Kanazawa Angjoo Kanazawa and I taught CS 280, graduate computer vision, this semester at UC Berkeley. We found a combination of classical and modern CV material that worked well, and are happy to share our lecture material from the class. cs280-berkeley.github.io Enjoy!

thumb_up_off_alt745

chat_bubble_outline8

repeat102

shareShare

Yi Zhou

@papagina_yi

3 months ago

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose

thumb_up_off_alt821

chat_bubble_outline11

repeat157

shareShare

Jeff Li

@jiefengli_jeff

3 months ago

📣📣📣 Excited to share GENMO: A Generalist Model for Human Motion. Words can’t perfectly describe human motion—so we build GENMO. It’s everything to motion. 🔥Video, Text, Music, Audio, Keyframes, Spatial Control…🔥 -- GENMO handles it all within a single model. 📹 Two

thumb_up_off_alt118

chat_bubble_outline2

repeat32

shareShare