Alex Trevithick (@alextrevith) 's Twitter Profile
Alex Trevithick

@alextrevith

PhD Student @UCSanDiego. Incoming Research Scientist @NVIDIAAI. 4D Vision, Machine Learning, Generative Models.

ID: 1315471649165189121

linkhttp://alextrevithick.com calendar_today12-10-2020 01:57:41

123 Tweet

506 Followers

271 Following

Kwang Moo Yi (@kwangmoo_yi) 's Twitter Profile Photo

Preprint of the day: Asim et al., "MEt3R: Measuring Multi-View Consistency in Generated Images" -- geometric-rl.mpi-inf.mpg.de/met3r/ Lots of diffusion-based solutions for novel-view synthesis recently, but how good are they? A metric to compare how "3D" they truly are.

Jon Barron (@jon_barron) 's Twitter Profile Photo

I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, λ) here:

Jiaming Song (@baaadas) 's Twitter Profile Photo

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

Jianyuan Wang (@jianyuan_wang) 's Twitter Profile Photo

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense

Xintao Wang (@xinntao) 's Twitter Profile Photo

Thanks AK for sharing our ReCamMaster! ReCamMaster can re-capture existing videos with novel camera trajectories. Project page: jianhongbai.github.io/ReCamMaster/ Paper: huggingface.co/papers/2503.11…

Stan Szymanowicz (@stanszymanowicz) 's Twitter Profile Photo

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)

Xingyu Chen (@roverxingyu) 's Twitter Profile Photo

🦣Easi3R: 4D Reconstruction Without Training! Limited 4D datasets? Take it easy. #Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever! 🔗Page: easi3r.github.io

Alex Trevithick (@alextrevith) 's Twitter Profile Photo

What's the difference between the oai and google image generators? Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. Does this indicate continuous encoder for Gemini vs. VQVAE for oai?

What's the difference between the oai and google image generators?

Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. 

Does this indicate continuous encoder for Gemini vs. VQVAE for oai?
Hanwen Jiang (@hanwenjiang1) 's Twitter Profile Photo

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy)

Bosung Kim (@bosungkim17) 's Twitter Profile Photo

Interactive looong-context reasoning still has a long way to go. We need progress across all axes: more data, bigger model, and smarter architectures. ∞-THOR is just beginning: generate ∞-len trajectories, run agents online train with feedback and more! Let’s push the limits🚀

Nithin Raghavan (@nithin_raghavan) 's Twitter Profile Photo

If you’re at SIGGRAPH 2025 in Vancouver, join us Thu 2 PM for our talk “Generative Neural Materials”! We introduce a universal neural material model for bidirectional texture functions and a complementary generative pipeline. 1/2

Sherwin Bahmani (@sherwinbahmani) 's Twitter Profile Photo

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a

World Labs (@theworldlabs) 's Twitter Profile Photo

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!