Shiyao Xu (@xusy2333) 's Twitter Profile
Shiyao Xu

@xusy2333

ELLIS PhD student @UniTrento, MS @PKU1898, B.Eng. @DUT. focus on 3D human motion understanding

ID: 1389140095471693824

linkhttp://xusy2333.com calendar_today03-05-2021 08:50:31

143 Tweet

232 Takipçi

614 Takip Edilen

Yunzhi Zhang (@zhang_yunzhi) 's Twitter Profile Photo

Get a peek into some (surprisingly accurate) 3D scenes generated via the Scene Language—our new scene representation! Now powered by the new Claude 3.5 Sonnet. More examples in our released repo: github.com/zzyunzhi/scene…. Why is the representation effective? (1/3)

David Acuna (@davidjesusacu) 's Twitter Profile Photo

Can large VLMs reason about object sizes and distances? 🤔 We dive into this with a fresh test set in our new work. While out-of-the-box most SoTA models struggle, instructing them to use reference objects in their reasoning paths notably improves performance! #EMNLP2024 👇

Kevin Chih-Yao Ma (@chihyaoma) 's Twitter Profile Photo

The true strength of foundation models often lies in the encoder itself. NVIDIA's new Cosmos Tokenizer looks pretty interesting … Website: research.nvidia.com/labs/dir/cosmo… GitHub: github.com/NVIDIA/Cosmos-…

Wentao Zhu (@walterzhu8) 's Twitter Profile Photo

Learn how we’re redefining evaluation of human motion generation at #ICLR2025! 💃 We propose MotionCritic to: ✅Evaluate motion quality like humans 🚀Enhance motion generators with minimal fine-tuning 🔍Even help uncover artifacts in GT motion datasets! 👉motioncritic.github.io

Jun Gao (@jungao33210520) 's Twitter Profile Photo

The fundamental reason why our real-world video is consistent and we can control the camera when capturing it is because real-world video is the 2D projection of the underlying 3D world. Here, we bring this insight into video generation for precise camera control and consistency!

Shiyao Xu (@xusy2333) 's Twitter Profile Photo

Does anyone know why my custom llama3.1-instruct generates something repeated like "Cutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024"?🥲

Jon Barron (@jon_barron) 's Twitter Profile Photo

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

Here's my 3DV talk, in chapters:

1) Intro / NeRF boilerplate.
2) Recent reconstruction work.
3) Recent generative work.
4) Radiance fields as a field.
5) Why generative video has bitter-lessoned 3D.
6) Why generative video hasn't bitter-lessoned 3D.

5 & 6 are my favorites.
Jitendra MALIK (@jitendramalikcv) 's Twitter Profile Photo

Angjoo Kanazawa Angjoo Kanazawa and I taught CS 280, graduate computer vision, this semester at UC Berkeley. We found a combination of classical and modern CV material that worked well, and are happy to share our lecture material from the class. cs280-berkeley.github.io Enjoy!

Yi Zhou (@papagina_yi) 's Twitter Profile Photo

🚀 Struggling with the lack of high-quality data for AI-driven human-object interaction research? We've got you covered! Introducing HUMOTO, a groundbreaking 4D dataset for human-object interaction, developed with a combination of wearable motion capture, SOTA 6D pose

Jeff Li (@jiefengli_jeff) 's Twitter Profile Photo

📣📣📣 Excited to share GENMO: A Generalist Model for Human Motion. Words can’t perfectly describe human motion—so we build GENMO. It’s everything to motion. 🔥Video, Text, Music, Audio, Keyframes, Spatial Control…🔥 -- GENMO handles it all within a single model. 📹 Two