Tianyuan Zhang (@tianyuanzhang99) Twitter Tweets • TwiCopy

Tianyuan Zhang

a year ago

Checkout Tianwei”s fast autoregressive video diffusion. A promising step towards real time interactive video generation!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

An image of an object tells more than the visual geometry of objects —it’s also a physical snapshot of an object in a state of static equilibrium. Can we use that cue to get more information about the objects? Checkout Minghao’s work on this topic: PhysComp!

thumb_up_off_alt31

chat_bubble_outline1

repeat2

shareShare

Jia-Bin Huang

@jbhuang0604

a year ago

The slide is bad, her response to an audience is even worse… “Maybe there is one, maybe they are common, who knows what. I hope it was an outlier."

thumb_up_off_alt354

chat_bubble_outline6

repeat21

shareShare

Yuandong Tian

@tydsh

a year ago

Unbelievable... This is explicit racial bias. How could this happen in NeurIPS? How could this be spoken by a top university professor, an invited keynote speaker?

thumb_up_off_alt510

chat_bubble_outline15

repeat23

shareShare

Hongjie Wang

@hongjiewang3

a year ago

🎉Excited to introduce our latest work, LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity! ✨For the first time, we demonstrate high-resolution 68-second video generation at 16fps on a single GPU— without relying on

thumb_up_off_alt316

chat_bubble_outline12

repeat54

shareShare

Hanwen Jiang

@hanwenjiang1

a year ago

💥 Think more real data is needed for scene reconstruction? Think again! Meet MegaSynth: scaling up feed-forward 3D scene reconstruction with synthesized scenes. In 3 days, it generates 700K scenes for training—70x larger than real data! ✨ The secret? Reconstruction is mostly

thumb_up_off_alt166

chat_bubble_outline7

repeat24

shareShare

leloy!

@leloykun

10 months ago

(Linear) Attention Mechanisms as Test-Time Regression By now, you've probably already heard of linear attention, in-context learning, test-time scaling, etc... Here, I'll discuss: 1. The unifying framework that ties them all together; 2. How to derive different linear

thumb_up_off_alt436

chat_bubble_outline6

repeat80

shareShare

Yinbo Chen

@yinbochen

10 months ago

Introducing “Diffusion Autoencoders are Scalable Image Tokenizers” (DiTo). We show that with proper designs and scaling up, diffusion autoencoders (a single L2 loss) can outperform the GAN-LPIPS tokenizers (hybrid losses) used in current SOTA generative models. (1/4)

thumb_up_off_alt510

chat_bubble_outline4

repeat104

shareShare

Tianyuan Zhang

@tianyuanzhang99

10 months ago

Very interesting work from MIT office mates! Diffusion Forcing with History Guidance introduces a novel approach to video generation, excelling at ultra-long sequences—800+ frames shown in the paper!

thumb_up_off_alt66

chat_bubble_outline1

repeat9

shareShare

Yilun Xu

@xuyilun2

9 months ago

Tired of slow diffusion models? Our new paper introduces f-distill, enabling arbitrary f-divergence for one-step diffusion distillation. JS divergence gives SOTA results on text-to-image! Choose the divergence that suits your needs. Joint work with Weili Nie Arash Vahdat 1/N

thumb_up_off_alt242

chat_bubble_outline8

repeat46

shareShare

Tianyuan Zhang

@tianyuanzhang99

9 months ago

Amazing results!

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Reve

@reveimage

8 months ago

Halfmoon is Reve Image — and it’s the best image model in the world 🥇 (🔊)

thumb_up_off_alt1,1K

chat_bubble_outline123

repeat209

shareShare

Tianwei Yin

@tianweiy

8 months ago

Super excited to share that I’ve officially defended my PhD, wrapped up an incredible journey at Massachusetts Institute of Technology (MIT) and Adobe Research, and joined Reve! Thrilled to be working alongside the same amazing founders I teamed up with back in the Adobe days. That experience gave me deep

thumb_up_off_alt235

chat_bubble_outline30

repeat15

shareShare

Hong-Xing "Koven" Yu

@koven_yu

8 months ago

🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: yuegao.me/FluidNexus/ Arxiv: arxiv.org/pdf/2503.04720

thumb_up_off_alt114

chat_bubble_outline5

repeat96

shareShare

Ken Liu

@kenziyuliu

8 months ago

An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵

thumb_up_off_alt290

chat_bubble_outline10

repeat79

shareShare

Hong-Xing "Koven" Yu

@koven_yu

8 months ago

🔥Spatial intelligence requires world generation, and now we have the first comprehensive evaluation benchmark📏 for it! Introducing WorldScore: Unifying evaluation for 3D, 4D, and video models on world generation! 🧵1/7 Web: haoyi-duan.github.io/WorldScore/ arxiv: arxiv.org/abs/2504.00983

thumb_up_off_alt244

chat_bubble_outline6

repeat116

shareShare

Haian Jin

@haian_jin

7 months ago

Excited to attend #ICLR2025 in person this year! I’ll be presenting two papers: 1. LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias 🔹 Oral Presentation: Session 3C (Garnet 216-218) — Apr 25 (Fri), 11:06–11:18 a.m. 🔹 Poster: Hall 3 + Hall 2B, Poster #593 — Apr

thumb_up_off_alt25

chat_bubble_outline1

repeat3

shareShare