shuo yang (@randwalk0) Twitter Tweets • TwiCopy

Enze Xie

3 months ago

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE

thumb_up_off_alt125

chat_bubble_outline3

repeat20

shareShare

Han Cai

@hancai_hm

3 months ago

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with

thumb_up_off_alt157

chat_bubble_outline1

repeat29

shareShare

Chenfeng_X

@chenfeng_x

2 months ago

🥳We’re releasing StreamDiffusionV2 for the live-stream community—from individual creators with one GPU to enterprise platforms with many. StreamDiffusionV2 is our follow-up to StreamDiffusion: #StreamDiffusion powered real products, but temporal consistency still bugged us.

thumb_up_off_alt204

chat_bubble_outline11

repeat43

shareShare

Melissa Pan

@melissapan

2 months ago

AI is going to replace researchers? 🙀 AI is going to replace PhD students? 🙀 AI is going to take all of our jobs? 🙀 NO! But… 🚨AI is upending systems research 🚨 We show that by leveraging AI-driven research systems (ADRS), we can drastically accelerate the algorithm

thumb_up_off_alt82

chat_bubble_outline2

repeat11

shareShare

Together AI

@togethercompute

2 months ago

What if your LLM inference automatically got faster the more you used it? Introducing ATLAS from the Together AI Turbo research team. Read more: togetherai.link/atlas Here’s Together AI Founder and Chief Scientist Tri Dao introducing ATLAS:

thumb_up_off_alt267

chat_bubble_outline9

repeat37

shareShare

Wenjie Ma

@wenjie_ma

2 months ago

LLMs solving math benchmarks with verifiable answers like AIME? ✅ LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key

thumb_up_off_alt187

chat_bubble_outline9

repeat37

shareShare

Yifan Qiao

@yifanqiao_ucla

2 months ago

🚀 End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. 🔗 github.com/ovg-project/kv… 🧵👇 Why it matters:

thumb_up_off_alt198

chat_bubble_outline9

repeat53

shareShare

Yu-Xiang Wang

@yuxiangw_cs

2 months ago

🚀 We just set a new SOTA for LLM inference acceleration with speculative decoding. By corralling a band of specialist drafters, we got 4.99× on Llama-3.1-8B-Instruct, 4.93× on Qwen-32B — beating EAGLE3 by nearly 2x. No gimmicks. Just careful math + solid engineering. 🧵1/

thumb_up_off_alt321

chat_bubble_outline13

repeat50

shareShare

Shiyi Cao

@shiyi_c98

23 days ago

1/n 🚀 Introducing SkyRL-Agent, a framework for efficient RL agent training. ⚡ 1.55× faster async rollout dispatch 🛠 Lightweight tool + task integration 🔄 Backend-agnostic (SkyRL-train / VeRL / Tinker) 🏆 Used to train SA-SWE-32B, improving Qwen3-32B from 24.4% → 39.4%

thumb_up_off_alt118

chat_bubble_outline3

repeat29

shareShare

Haocheng Xi

@haochengxiucb

19 days ago

🎉 Come check out our Spotlight Poster @Neurips 2025! 🚀 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation 📍 Exhibit Hall C,D,E — #3508 🗓️ Fri, Dec 5 | 🕓 4:30–7:30 PM PST ⚡ Sparse VideoGen2 boosts video generation efficiency

thumb_up_off_alt34

chat_bubble_outline1

repeat8

shareShare

Xingyang Li

@xyli_bruce

16 days ago

See you at our poster session at NeurIPS Conference! 🎉 Radial Attention: O(n log n) Sparse Attention with Energy Decay for Long Video Generation 4:30 - 7:30pm, Exhibit Hall, CDE, id = 5414 Come talk with us if you are interested in efficient ML & VideoGen 🥳

See you at our poster session at <a href="/NeurIPSConf/">NeurIPS Conference</a>! 🎉

Radial Attention: O(n log n) Sparse Attention with Energy Decay for Long Video Generation

4:30 - 7:30pm, Exhibit Hall, CDE, id = 5414

Come talk with us if you are interested in efficient ML & VideoGen 🥳

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Melissa Pan

@melissapan

15 days ago

Thrilled to release our new paper MAP: Measuring Agents in Production ⚙️🚀 2025 is the year of agents… but do they actually work in the real world? Is it just hype? A group of 25 researchers from Berkeley, Stanford, UIUC, IBM, and Intesa Sanpaolo investigated what makes agents

thumb_up_off_alt435

chat_bubble_outline14

repeat95

shareShare

Ying Sheng

@ying11231

12 days ago

We've been running RadixArk for a few months, started by many core developers in SGLang LMSYS Org and its extended ecosystem (slime slime , AReaL Yi Wu). I left xAI in August — a place where I built deep emotions and countless beautiful memories. It was the best

thumb_up_off_alt1,1K

chat_bubble_outline107

repeat122

shareShare

Tete

@winterice10

5 days ago

TurboDiffusion: 100–205× faster video generation on a single RTX 5090 🚀 Only takes 1.8s to generate a high-quality 5-second video. The key to both high speed and high quality? 😍SageAttention + Sparse-Linear Attention (SLA) + rCM Github: github.com/thu-ml/TurboDi… Technical

thumb_up_off_alt818

chat_bubble_outline27

repeat150

shareShare

Huanzhi Mao

@huanzhimao

2 days ago

Pass/fail benchmarks are saturated. It’s time for FrontierCS. 🚀 150+ unsolved, verifiable problems ranging from competitive programming to real-world research. Designed by PhDs & ICPC experts to evolve model intelligence. 🎓🧠 🧵👇Check it out! Paper: arxiv.org/abs/2512.15699

thumb_up_off_alt217

chat_bubble_outline6

repeat36

shareShare

Wentao Guo

@wentaoguo7

2 days ago

🚀SonicMoE🚀: a blazingly-fast MoE implementation optimized for NVIDIA Hopper GPUs. SonicMoE reduces activation memory by 45% and is 1.86x faster on H100 than previous SOTA😃 Paper: arxiv.org/abs/2512.14080 Work with Mayank Mishra, Xinle Cheng, Ion Stoica, Tri Dao

thumb_up_off_alt553

chat_bubble_outline18

repeat95

shareShare