shuo yang (@randwalk0) 's Twitter Profile
shuo yang

@randwalk0

ID: 1629407863440539649

calendar_today25-02-2023 09:08:19

9 Tweet

37 Followers

51 Following

Enze Xie (@xieenze_jr) 's Twitter Profile Photo

πŸš€ SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos πŸ’₯ Key Features 🌟 🧠 Linear DiT everywhere β†’ O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache β†’ store cumulative states only (no growing KV) πŸ”„ 🎯 Temporal Mix-FFN + 3D RoPE

Han Cai (@hancai_hm) 's Twitter Profile Photo

Changing the autoencoder in latent diffusion models is easier than you think. πŸš€ Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with

Changing the autoencoder in latent diffusion models is easier than you think.

πŸš€ Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
Chenfeng_X (@chenfeng_x) 's Twitter Profile Photo

πŸ₯³We’re releasing StreamDiffusionV2 for the live-stream communityβ€”from individual creators with one GPU to enterprise platforms with many. StreamDiffusionV2 is our follow-up to StreamDiffusion: #StreamDiffusion powered real products, but temporal consistency still bugged us.

Melissa Pan (@melissapan) 's Twitter Profile Photo

AI is going to replace researchers? πŸ™€ AI is going to replace PhD students? πŸ™€ AI is going to take all of our jobs? πŸ™€ NO! But… 🚨AI is upending systems research 🚨 We show that by leveraging AI-driven research systems (ADRS), we can drastically accelerate the algorithm

Together AI (@togethercompute) 's Twitter Profile Photo

What if your LLM inference automatically got faster the more you used it? Introducing ATLAS from the Together AI Turbo research team. Read more: togetherai.link/atlas Here’s Together AI Founder and Chief Scientist Tri Dao introducing ATLAS:

Wenjie Ma (@wenjie_ma) 's Twitter Profile Photo

LLMs solving math benchmarks with verifiable answers like AIME? βœ… LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key

Yifan Qiao (@yifanqiao_ucla) 's Twitter Profile Photo

πŸš€ End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. πŸ”— github.com/ovg-project/kv… πŸ§΅πŸ‘‡ Why it matters:

πŸš€ End the GPU Cost Crisis Today!!!

Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization?

We launch kvcached, the first library for elastic GPU sharing across LLMs.
πŸ”— github.com/ovg-project/kv…
πŸ§΅πŸ‘‡ Why it matters:
Yu-Xiang Wang (@yuxiangw_cs) 's Twitter Profile Photo

πŸš€ We just set a new SOTA for LLM inference acceleration with speculative decoding. By corralling a band of specialist drafters, we got 4.99Γ— on Llama-3.1-8B-Instruct, 4.93Γ— on Qwen-32B β€” beating EAGLE3 by nearly 2x. No gimmicks. Just careful math + solid engineering. 🧡1/

πŸš€ We just set a new SOTA for LLM inference acceleration with speculative decoding. By corralling a band of specialist drafters,  we got 4.99Γ— on Llama-3.1-8B-Instruct, 4.93Γ— on Qwen-32B β€” beating EAGLE3 by nearly 2x. No gimmicks. Just careful math + solid engineering.  🧡1/
Shiyi Cao (@shiyi_c98) 's Twitter Profile Photo

1/n πŸš€ Introducing SkyRL-Agent, a framework for efficient RL agent training. ⚑ 1.55Γ— faster async rollout dispatch πŸ›  Lightweight tool + task integration πŸ”„ Backend-agnostic (SkyRL-train / VeRL / Tinker) πŸ† Used to train SA-SWE-32B, improving Qwen3-32B from 24.4% β†’ 39.4%

1/n
πŸš€ Introducing SkyRL-Agent, a framework for efficient RL agent training.

⚑ 1.55Γ— faster async rollout dispatch
πŸ›  Lightweight tool + task integration
πŸ”„ Backend-agnostic (SkyRL-train / VeRL / Tinker)
πŸ† Used to train SA-SWE-32B, improving Qwen3-32B from 24.4% β†’ 39.4%
Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

πŸŽ‰ Come check out our Spotlight Poster @Neurips 2025! πŸš€ Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation πŸ“ Exhibit Hall C,D,E β€” #3508 πŸ—“οΈ Fri, Dec 5 | πŸ•“ 4:30–7:30 PM PST ⚑ Sparse VideoGen2 boosts video generation efficiency

Xingyang Li (@xyli_bruce) 's Twitter Profile Photo

See you at our poster session at NeurIPS Conference! πŸŽ‰ Radial Attention: O(n log n) Sparse Attention with Energy Decay for Long Video Generation 4:30 - 7:30pm, Exhibit Hall, CDE, id = 5414 Come talk with us if you are interested in efficient ML & VideoGen πŸ₯³

See you at our poster session  at <a href="/NeurIPSConf/">NeurIPS Conference</a>! πŸŽ‰

Radial Attention: O(n log n) Sparse Attention with Energy Decay for Long Video Generation

4:30 - 7:30pm, Exhibit Hall, CDE, id = 5414

Come talk with us if you are interested in efficient ML &amp; VideoGen πŸ₯³
Melissa Pan (@melissapan) 's Twitter Profile Photo

Thrilled to release our new paper MAP: Measuring Agents in Production βš™οΈπŸš€ 2025 is the year of agents… but do they actually work in the real world? Is it just hype? A group of 25 researchers from Berkeley, Stanford, UIUC, IBM, and Intesa Sanpaolo investigated what makes agents

Thrilled to release our new paper MAP: Measuring Agents in Production βš™οΈπŸš€

2025 is the year of agents… but do they actually work in the real world? Is it just hype?

A group of 25 researchers from Berkeley, Stanford, UIUC, IBM, and Intesa Sanpaolo investigated what makes agents
Ying Sheng (@ying11231) 's Twitter Profile Photo

We've been running RadixArk for a few months, started by many core developers in SGLang LMSYS Org and its extended ecosystem (slime slime , AReaL Yi Wu). I left xAI in August β€” a place where I built deep emotions and countless beautiful memories. It was the best

Tete (@winterice10) 's Twitter Profile Photo

TurboDiffusion: 100–205Γ— faster video generation on a single RTX 5090 πŸš€ Only takes 1.8s to generate a high-quality 5-second video. The key to both high speed and high quality? 😍SageAttention + Sparse-Linear Attention (SLA) + rCM Github: github.com/thu-ml/TurboDi… Technical

Huanzhi Mao (@huanzhimao) 's Twitter Profile Photo

Pass/fail benchmarks are saturated. It’s time for FrontierCS. πŸš€ 150+ unsolved, verifiable problems ranging from competitive programming to real-world research. Designed by PhDs & ICPC experts to evolve model intelligence. πŸŽ“πŸ§  πŸ§΅πŸ‘‡Check it out! Paper: arxiv.org/abs/2512.15699

Pass/fail benchmarks are saturated. It’s time for FrontierCS. πŸš€

150+ unsolved, verifiable problems ranging from competitive programming to real-world research. Designed by PhDs &amp; ICPC experts to evolve model intelligence. πŸŽ“πŸ§ 

πŸ§΅πŸ‘‡Check it out!

Paper: arxiv.org/abs/2512.15699
Wentao Guo (@wentaoguo7) 's Twitter Profile Photo

πŸš€SonicMoEπŸš€: a blazingly-fast MoE implementation optimized for NVIDIA Hopper GPUs. SonicMoE reduces activation memory by 45% and is 1.86x faster on H100 than previous SOTAπŸ˜ƒ Paper: arxiv.org/abs/2512.14080 Work with Mayank Mishra, Xinle Cheng, Ion Stoica, Tri Dao

πŸš€SonicMoEπŸš€: a blazingly-fast MoE implementation optimized for NVIDIA Hopper GPUs. SonicMoE reduces activation memory by 45% and is 1.86x faster on H100 than previous SOTAπŸ˜ƒ

Paper: arxiv.org/abs/2512.14080

Work with <a href="/MayankMish98/">Mayank Mishra</a>, <a href="/XinleC295/">Xinle Cheng</a>, <a href="/istoica05/">Ion Stoica</a>, <a href="/tri_dao/">Tri Dao</a>