Haian Jin (@haian_jin) Twitter Tweets • TwiCopy

Linyi Jin

9 months ago

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

thumb_up_off_alt524

chat_bubble_outline13

repeat102

shareShare

Haian Jin

@haian_jin

9 months ago

I can’t attend the #NeurIPS conference this year, but Yuanbo Xiangli will present Neural Gaffer in person. Drop by our poster at West Ballroom A-D #7001 if you are interested! Time: Fri 13 Dec 4:30 p.m. — 7:30 p.m.

thumb_up_off_alt41

chat_bubble_outline0

repeat4

shareShare

Haian Jin

@haian_jin

9 months ago

I hope this attitude is an outlier in research community.

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Chen Geng

@gengchen01

9 months ago

Ever wondered how roses grow and wither in your backyard?🌹 Our latest work on generating 4D temporal object intrinsics lets you explore a rose's entire lifecycle—from birth to death—under any environment light, from any viewpoint, at any moment. Project page:

thumb_up_off_alt188

chat_bubble_outline5

repeat37

shareShare

Hanwen Jiang

@hanwenjiang1

9 months ago

💥 Think more real data is needed for scene reconstruction? Think again! Meet MegaSynth: scaling up feed-forward 3D scene reconstruction with synthesized scenes. In 3 days, it generates 700K scenes for training—70x larger than real data! ✨ The secret? Reconstruction is mostly

thumb_up_off_alt166

chat_bubble_outline7

repeat24

shareShare

Ben Lang ᯅ

@benz145

9 months ago

Very cool (and almost *frustrating*) optical illusion. If you pause this video at any time, the shapes disappear. They exist *only* temporally, not visually. This is from Branta Games on YouTube (full video linked below)

thumb_up_off_alt2,2K

chat_bubble_outline74

repeat240

shareShare

Ruojin Cai

@ruojin8

9 months ago

🤔Can Generative Video Models Help Pose Estimation? ✅Yes! We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap. 🔗 inter-pose.github.io

thumb_up_off_alt223

chat_bubble_outline2

repeat34

shareShare

Qianqian Wang

@qianqianwang5

7 months ago

Introducing CUT3R! An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic scenes. Video or image collections, all in one!

thumb_up_off_alt634

chat_bubble_outline7

repeat113

shareShare

Isabella Liu

@isabella__liu

7 months ago

🐅 Want to rig your favorite meme character? Try “RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets”! ✨RigAnything is a transformer-based model that sequentially generates skeletons without predefined templates. It creates high-quality skeletons for

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat229

shareShare

Haian Jin

@haian_jin

5 months ago

Our paper LVSM has been accepted as an oral presentation at #ICLR2025! See you in Singapore! We’ve just released the code and checkpoints—check it out here: github.com/haian-jin/LVSM.🚀

thumb_up_off_alt127

chat_bubble_outline2

repeat19

shareShare

Hansheng Chen

@hanshengch

5 months ago

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

thumb_up_off_alt122

chat_bubble_outline1

repeat31

shareShare

Hanwen Jiang

@hanwenjiang1

4 months ago

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

thumb_up_off_alt391

chat_bubble_outline5

repeat69

shareShare

Tianyuan Zhang

@tianyuanzhang99

3 months ago

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

thumb_up_off_alt390

chat_bubble_outline5

repeat74

shareShare

Haian Jin

@haian_jin

3 months ago

Check out Tianyuan’s latest exciting work on long-context memory with efficient test-time training! This simple and scalable technique has broad applicability across various tasks, including large-scale scene NVS, AR video models, LLMs, …

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Zhengqi Li

@zhengqi_li

3 months ago

Check out our new work, Self-Forcing! By addressing the training/inference mismatch, Self-Forcing enables real-time streaming video generation on a single GPU while acheiving competitive or superior performance compared to SOTA video models that run significantly slower.

thumb_up_off_alt58

chat_bubble_outline0

repeat8

shareShare

Gene Chou

@gene_ch0u

3 months ago

I'll be presenting our work with Kai Zhang at #cvpr2025. We finetune video models to be 3d consistent without any 3d supervision! Feel free to stop by our poster or reach out to chat: Sunday, Jun 15, 4-6pm ExHall D, poster #168 cvpr.thecvf.com/virtual/2025/p…

thumb_up_off_alt67

chat_bubble_outline0

repeat7

shareShare

Rundi Wu

@chriswu6080

3 months ago

I’ll be presenting CAT4D this Sunday at 1:45pm and our poster session will start afterwards at 4pm. Feel free to come and say hi! cvpr.thecvf.com/virtual/2025/o…

thumb_up_off_alt38

chat_bubble_outline1

repeat2

shareShare