Alex Trevithick (@alextrevith) Twitter Tweets • TwiCopy

Kwang Moo Yi

a year ago

Preprint of the day: Asim et al., "MEt3R: Measuring Multi-View Consistency in Generated Images" -- geometric-rl.mpi-inf.mpg.de/met3r/ Lots of diffusion-based solutions for novel-view synthesis recently, but how good are they? A metric to compare how "3D" they truly are.

thumb_up_off_alt110

chat_bubble_outline0

repeat21

shareShare

Ethan Mollick

@emollick

a year ago

The raw chain of thought from DeepSeek is fascinating, really reads like a human thinking out loud. Charming and strange.

thumb_up_off_alt4,4K

chat_bubble_outline89

repeat458

shareShare

Jon Barron

@jon_barron

10 months ago

I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, λ) here:

thumb_up_off_alt2,2K

chat_bubble_outline39

repeat262

shareShare

Jiaming Song

@baaadas

9 months ago

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

thumb_up_off_alt881

chat_bubble_outline21

repeat105

shareShare

Jianyuan Wang

@jianyuan_wang

9 months ago

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense

thumb_up_off_alt3,3K

chat_bubble_outline21

repeat195

shareShare

Xintao Wang

@xinntao

9 months ago

Thanks AK for sharing our ReCamMaster! ReCamMaster can re-capture existing videos with novel camera trajectories. Project page: jianhongbai.github.io/ReCamMaster/ Paper: huggingface.co/papers/2503.11…

thumb_up_off_alt118

chat_bubble_outline5

repeat20

shareShare

Stan Szymanowicz

@stanszymanowicz

9 months ago

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)

thumb_up_off_alt524

chat_bubble_outline27

repeat90

shareShare

Xingyu Chen

@roverxingyu

8 months ago

🦣Easi3R: 4D Reconstruction Without Training! Limited 4D datasets? Take it easy. #Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever! 🔗Page: easi3r.github.io

thumb_up_off_alt168

chat_bubble_outline5

repeat30

shareShare

Alex Trevithick

@alextrevith

8 months ago

What's the difference between the oai and google image generators? Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. Does this indicate continuous encoder for Gemini vs. VQVAE for oai?

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Alex Trevithick

@alextrevith

8 months ago

After finishing ICCV reviews this year...

thumb_up_off_alt37

chat_bubble_outline4

repeat0

shareShare

Hanwen Jiang

@hanwenjiang1

7 months ago

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

thumb_up_off_alt391

chat_bubble_outline5

repeat69

shareShare

Bosung Kim

@bosungkim17

7 months ago

Interactive looong-context reasoning still has a long way to go. We need progress across all axes: more data, bigger model, and smarter architectures. ∞-THOR is just beginning: generate ∞-len trajectories, run agents online train with feedback and more! Let’s push the limits🚀

thumb_up_off_alt17

chat_bubble_outline0

repeat6

shareShare

Aleksander Holynski

@holynski_

6 months ago

Poster #60 this afternoon, swing by!

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Jack (in SF) Langerman

@jacklangerman

6 months ago

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Rundi Wu Ruiqi Gao Ben Poole Alex Trevithick ChangxiZheng Jon Barron Aleksander Holynski

thumb_up_off_alt153

chat_bubble_outline1

repeat23

shareShare

Nithin Raghavan

@nithin_raghavan

4 months ago

If you’re at SIGGRAPH 2025 in Vancouver, join us Thu 2 PM for our talk “Generative Neural Materials”! We introduce a universal neural material model for bidirectional texture functions and a complementary generative pipeline. 1/2

thumb_up_off_alt19

chat_bubble_outline1

repeat5

shareShare

Sherwin Bahmani

@sherwinbahmani

3 months ago

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a

thumb_up_off_alt246

chat_bubble_outline19

repeat69

shareShare

Alex Trevithick

@alextrevith

2 months ago

Come check out our workshop at the intersection of generative models and 3D reconstruction at #ICCV2025! 🏝️

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

World Labs

@theworldlabs

2 months ago

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!

thumb_up_off_alt1,1K

chat_bubble_outline53

repeat223

shareShare