Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile
Guangxuan Xiao

@guangxuan_xiao

Ph.D. student at @MITEECS Prev: CS & Finance @Tsinghua_Uni

ID: 1230368227336654849

linkhttps://guangxuanx.com calendar_today20-02-2020 05:47:08

87 Tweet

1,1K Followers

556 Following

Chenyu Wang (@chenyuw64562111) 's Twitter Profile Photo

Excited to share: "Fine-tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design" With my amazing coauthors Masatoshi Uehera, Yichun He, Amy Wang, Tommaso Biancalani, @lal_avantika, Tommi Jaakkola, Sergey Levine, Hanchen Wang, Aviv Regev

Excited to share: "Fine-tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design"
With my amazing coauthors Masatoshi Uehera, <a href="/yiyiyihe/">Yichun He</a>, <a href="/amywang01/">Amy Wang</a>, <a href="/tbyanc/">Tommaso Biancalani</a>, @lal_avantika, Tommi Jaakkola, <a href="/svlevine/">Sergey Levine</a>, <a href="/hcwww_/">Hanchen Wang</a>, Aviv Regev
机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads arxiv.org/abs/2410.10819 github.com/mit-han-lab/du… #MIT Song Han

Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀 The 4-bit era has arrived! Meet #SVDQuant, our new W4A4 quantization paradigm for diffusion models. Now, 12B FLUX can run on a 16GB 4090 laptop without offloading—with 3x speedups over W4A16 models (like NF4) while maintaining top-tier image quality.  #AI #Quantization. 1/7

Scale ML (@scaleml) 's Twitter Profile Photo

Hello everyone, this week at 3pm EST Nov 20 (Wed) we will be having Guangxuan Xiao present his work about efficient/effective long sequence modeling! Sign up via scale-ml.org to join our mailing list and zoom access.

Hello everyone, this week at 3pm EST Nov 20 (Wed) we will be having <a href="/Guangxuan_Xiao/">Guangxuan Xiao</a>  present his work about efficient/effective long sequence modeling!

Sign up via scale-ml.org to join our mailing list and zoom access.
Tianwei Yin (@tianweiy) 's Twitter Profile Photo

Video diffusion models generate high-quality videos but are too slow for interactive applications. We MIT CSAIL Adobe Research introduce CausVid, a fast autoregressive video diffusion model that starts playing the moment you hit "Generate"! A thread 🧵

Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

🚀 We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. COAT is accepted by ICLR 2025! FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of

🚀 We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. 
COAT is accepted by ICLR 2025!

FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of
Shang Yang (@shang_mit) 's Twitter Profile Photo

🎉 Excited to share that LServe, our efficient long-sequence LLM serving framework, is accepted by #MLSys’25! 🔥 ⚡Up to 2.9× faster prefilling & 1.3-2.1× faster decoding over vLLM 🔋Hybrid attention kernels unifying static & dynamic sparsity 🔗 hanlab.mit.edu/projects/lserve (1/5)

🎉 Excited to share that LServe, our efficient long-sequence LLM serving framework, is accepted by #MLSys’25! 🔥

⚡Up to 2.9× faster prefilling &amp; 1.3-2.1× faster decoding over vLLM
🔋Hybrid attention kernels unifying static &amp; dynamic sparsity

🔗 hanlab.mit.edu/projects/lserve (1/5)
Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation! ✅ Plug-and-play: works with pretrained models like #Wan, #HunyuanVideo, #Mochi ✅ Speeds up both training&inference by 2–4×, without quality loss 🧵1/4

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁

The release of GPT-OSS-120B &amp; GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). 

Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁
Graham Neubig (@gneubig) 's Twitter Profile Photo

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

Ryan Hanrui Wang (@hanrui_w) 's Twitter Profile Photo

Announcing Eigen AI Eigen AI, the world’s first company dedicated to AEI — Artificial Efficient Intelligence. 🚀 The future of AI is already here; it’s simply not evenly distributed. Our mission is to close that gap by driving radical efficiency so that every person and