Tianshuo Yang (@tianshuo_y) 's Twitter Profile
Tianshuo Yang

@tianshuo_y

Ph.D. Student @HKUniversity ✨ B.Eng. @ZJU_China ✨
3DV&Embodied AI

ID: 1619510728230334465

calendar_today29-01-2023 01:40:39

17 Tweet

86 Takipçi

552 Takip Edilen

Yen-Chen Lin (@yen_chen_lin) 's Twitter Profile Photo

Video generation models exploded onto the scene in 2024, sparked by the release of Sora from OpenAI. I wrote a blog post on key techniques that are used in building large video generation models: yenchenlin.me/blog/2025/01/0…

Fei Xia (@xf1280) 's Twitter Profile Photo

On the note of vibe coding for ML, I created this fun cartpole demo with Google Gemini App 2.5 Pro Exp Canvas. It is a replica of Exploratorium exhibit: The left cartpole is controlled by a DQN agent, and learns in real time, and the right can be controlled by a player.

Jon Barron (@jon_barron) 's Twitter Profile Photo

A thread of thoughts on radiance fields, from my keynote at 3DV: Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.

A thread of thoughts on radiance fields, from my keynote at 3DV:

Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
Keenan Crane (@keenanisalive) 's Twitter Profile Photo

Here's a nice "proof without words": The sum of the squares of several positive values can never be bigger than the square of their sum. This picture helps make sense of how ℓ₁ and ℓ₂ norms regularize and sparsify solutions (resp.). [1/n]

Physical Intelligence (@physical_int) 's Twitter Profile Photo

We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️

Jon Barron (@jon_barron) 's Twitter Profile Photo

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

Here's my 3DV talk, in chapters:

1) Intro / NeRF boilerplate.
2) Recent reconstruction work.
3) Recent generative work.
4) Radiance fields as a field.
5) Why generative video has bitter-lessoned 3D.
6) Why generative video hasn't bitter-lessoned 3D.

5 & 6 are my favorites.
Rui Li (@leedaray) 's Twitter Profile Photo

🚀 Details of the #CVPR2025 award candidate papers are out. 14 of 2967 accepted papers made the list, spanning 3D vision, embodied AI, VLMs/MLLMs, learning systems, and scene understanding. 3D vision leads with the most entries. I collected the TL;DR, paper, and project links👇

Wenlong Huang (@wenlong_huang) 's Twitter Profile Photo

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this

Gordon Wetzstein (@gordonwetzstein) 's Twitter Profile Photo

The context size of video world models is only a few frames. Like a human with severe memory loss! We design a long-term memory for world models based on explicit 3D representations inspired by the human mind. This enables long-term consistency. spmem.github.io 1/3

Gordon Wetzstein (@gordonwetzstein) 's Twitter Profile Photo

The human brain uses distinct regions for visual and spatial memory. In our mechanism, conventional context frames model the visual working memory; a long-term spatial memory is modeled as an explicit point cloud; a long-term visual episodic memory is modeled using keyframes. 2/3

Yunzhi Zhang (@zhang_yunzhi) 's Twitter Profile Photo

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

Matthias Niessner (@mattniessner) 's Twitter Profile Photo

📢 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans🏠✨ -> converts RGB-D scans into compact, realistic, and interactive 3D scenes — featuring high-quality meshes, PBR materials, and articulated objects. 📷youtu.be/ecK9m3LXg2c 🌍litereality.github.io

Yuxi Xiao (@yuxixiaohenry) 's Twitter Profile Photo

🚀 We release SpatialTrackerV2: the first feedforward model for dynamic 3D reconstruction and 3D point tracking — all at once! Reconstruct dynamic scenes and predict pixel-wise 3D motion in seconds. 🔗 Webpage: spatialtracker.github.io 🔍 Online Demo: huggingface.co/spaces/Yuxihen…

Pablo Vela (@pablovelagomez1) 's Twitter Profile Photo

🚀 Introducing EgoExo Forge - built on top of Rerun, Gradio, and Hugging Face hub (I’ll be in San Francisco July 21–29 — if you’re into robotics, egocentric AI, large-scale data collection, or just want to chat, DM me!) In my opinion, large-scale, diverse, and