Zirui Liu (@ziruirayliu) 's Twitter Profile
Zirui Liu

@ziruirayliu

Assistant Professor of CS @UMNComputerSci
| PhD @RiceUniversity

ID: 1770581561089359872

linkhttps://zirui-ray-liu.github.io/ calendar_today20-03-2024 22:41:59

62 Tweet

295 Takipçi

612 Takip Edilen

Hao Zhang (@haozhangml) 's Twitter Profile Photo

Beyond thrilled 🚀 to see my lab's work DistServe (OSDI'24) just got featured in Jensen Huang's keynote at Nvidia GTC! This marks our third major breakthrough in LLM inference after PagedAttention (vLLM) and Lookahead Decoding — pushing the frontier yet again! Since we post the

Qwen (@alibaba_qwen) 's Twitter Profile Photo

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general
Minghao Yan (@minghao__yan) 's Twitter Profile Photo

I will be presenting our work, Decoding Speculative Decoding, at NAACL tomorrow. We identified the performance bottleneck in speculative decoding to be draft model depth and demonstrated low correlation between language modeling performance and token acceptance rate.

I will be presenting our work, Decoding Speculative Decoding, at <a href="/naacl/">NAACL</a> tomorrow. We identified the performance bottleneck in speculative decoding to be draft model depth and demonstrated low correlation between language modeling performance and token acceptance rate.
Zirui Liu (@ziruirayliu) 's Twitter Profile Photo

There are many fast Matmul Algorithm that faster than the standard N^3 complexity. In practice, the biggest problem of them is (1) numerical unstable (2) cache unfriendly.

Huawei Lin (@huaweilin7) 's Twitter Profile Photo

Why does autoregressive (AR) image generation still suck — even in the era of LLMs? The core issue might be the Visual Tokenizer (VT). We introduce VTBench, a comprehensive benchmark designed to evaluate 20+ state-of-the-art VTs in the context of AR image generation. Our

Why does autoregressive (AR) image generation still suck — even in the era of LLMs?

The core issue might be the Visual Tokenizer (VT). We introduce VTBench, a comprehensive benchmark designed to evaluate 20+ state-of-the-art VTs in the context of AR image generation. Our
Saining Xie (@sainingxie) 's Twitter Profile Photo

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. 

I also gave a talk on "Research as an Infinite Game." Here are the slides:
canva.com/design/DAGp0iR…
Zirui Liu (@ziruirayliu) 's Twitter Profile Photo

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

Zirui Liu (@ziruirayliu) 's Twitter Profile Photo

Seems like the numerical precision not only hurts the inference, it also has impacts to RL training, too. The fix proposed in MiniMAX is using FP32 for LM head layer. Actually I tried this trick before (and also FP32 KV Cache). They indeed alleivate the problem, but still have

Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org.
We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to
Wentao Guo (@wentaoguo7) 's Twitter Profile Photo

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With Ted Zadouri and Tri Dao

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 

With <a href="/tedzadouri/">Ted Zadouri</a> and <a href="/tri_dao/">Tri Dao</a>
Denghui Zhang (@denghui_zhang) 's Twitter Profile Photo

We’re grateful that our recent work on the Theory-of-Mind of LLMs was featured by MIT Technology Review China 🙏 🔗 mittrchina.com/news/detail/15… In two new preprints, we explore: Sparsity and ToM: How extremely sparse patterns in LLMs shape their ability to infer others’ beliefs and

We’re grateful that our recent work on the Theory-of-Mind of LLMs was featured by MIT Technology Review China 🙏 
🔗 mittrchina.com/news/detail/15…

In two new preprints, we explore: Sparsity and ToM: How extremely sparse patterns in LLMs shape their ability to infer others’ beliefs and
Forbes (@forbes) 's Twitter Profile Photo

This computer science professor became a billionaire launching four startups out of his privately-funded research lab, including unicorns Databricks and Anyscale. But it’s never been just about business. (Photo: Timothy Archibald for Forbes) trib.al/xyBDRVN

This computer science professor became a billionaire launching four startups out of his privately-funded research lab, including unicorns Databricks and Anyscale. But it’s never been just about business. (Photo: Timothy Archibald for Forbes) trib.al/xyBDRVN
Tianqi Chen (@tqchenml) 's Twitter Profile Photo

MLSys infrastructure (compilers, inference engines, runtimes, GPU accelerations, and more) is at the heart of the AI revolution today, and AI has the potential to empower the system revolution itself. #MLSys2026 launches inaugural industry track, consider submit your paper!

Denghui Zhang (@denghui_zhang) 's Twitter Profile Photo

Interpretability: Understanding how AI models think youtu.be/fGKNUvivvnc?si… via YouTube Anthropic Anthropic’s new video dives into AI interpretability—how models think & why it matters 🧠✨ Our EMNLP paper SafeSwitch takes a similar path: leveraging internal activations