david yan (@dzyan01) Twitter Tweets • TwiCopy

david yan

@dzyan01

+ Follow

meandering researcher @PrincetonVL

ID: 1904013727235457024

linkhttps://david-yan1.github.io calendar_today24-03-2025 03:33:52

0 Tweet

12 Followers

69 Following

Kevin Wang

@kevin_wang3290

5 months ago

Excited to share that our paper "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities" has won the Best Paper Award at NeurIPS '25! Hope to see you all in San Diego :)

thumb_up_off_alt223

chat_bubble_outline10

repeat23

shareShare

Princeton Vision & Learning Lab

@princetonvl

5 months ago

Estimating camera intrinsics from video is key to 3D reconstruction, but most methods assume they’re fixed per video. What if the camera keeps zooming and refocusing? Meet InFlux, the first benchmark with per-frame ground truth for videos with dynamic intrinsics. 🧵1/5

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

Kyle Sargent

@kylesargentai

4 months ago

Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ Google DeepMind, Google Research, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:

Vision-language models are getting better every day. Can we use them to improve image compression? Yes! For my internship, working w/ <a href="/GoogleDeepMind/">Google DeepMind</a>, <a href="/GoogleResearch/">Google Research</a>, we designed VLIC, a diffusion autoencoder post-trained with VLM preferences. Our preprint is out today! A🧵:

thumb_up_off_alt305

chat_bubble_outline4

repeat38

shareShare

Princeton Vision & Learning Lab

@princetonvl

2 months ago

Meet WAFT (Warping-Alone Field Transforms), our new optical-flow estimator. #1 on public benchmarks (Sintel & Spring), 1.3-4.1x faster than leading methods, and 2x lower memory. Key idea: replace cost volumes with high-res feature-space warping. Code and paper:👇

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Juno KIM

@junokim_ai

25 days ago

Excited to share our new paper on sharp capacity scaling of the Muon optimizer! Joint work with Eshaan Nichani Denny Wu Alberto Bietti Jason Lee: arxiv.org/abs/2603.26554 (1/7)

thumb_up_off_alt123

chat_bubble_outline4

repeat31

shareShare

Jack Zhang

@jcz42

25 days ago

We made Muon run up to 2x faster for free! Introducing Gram Newton-Schulz: a mathematically equivalent but computationally faster Newton-Schulz algorithm for polar decomposition. Gram Newton-Schulz rewrites Newton-Schulz such that instead of iterating on the expensive

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat164

shareShare

Ethan

@torchcompiled

24 days ago

ML interview question: You’re training a 72B MoE MNIST classifier. Layer 53 MLP expert 7 destabilizes when the ones in the dataset are turned upside down. What happened?

thumb_up_off_alt339

chat_bubble_outline25

repeat18

shareShare

Princeton Vision & Learning Lab

@princetonvl

24 days ago

Stereo depth is highly useful for robots. Meet WAFT-Stereo: #1 on ETH3D (BP-0.5), Middlebury (RMSE), and KITTI (all metrics); 61% less zero-shot ETH3D BP-0.5 error; 1.8-6.7x faster than prior SOTA. Key idea: classify disparity into bins, then iterative high-res warping.🧵1/2

thumb_up_off_alt116

chat_bubble_outline3

repeat23

shareShare

Guanyu Zhou

@tmartyr4951

11 days ago

It's time to systematically teach VLMs to see with synthetic images! We built VisionFoundry, a simple but intuitive framework that generates synthetic image datasets from only a task name. 10k synthetic data → over +10% improvement on visual perception benchmarks 👀

thumb_up_off_alt235

chat_bubble_outline6

repeat38

shareShare