Zirui Liu (@ziruirayliu) Twitter Tweets • TwiCopy

Zirui Liu

@ziruirayliu

5 months ago

LOL

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Beyond thrilled 🚀 to see my lab's work DistServe (OSDI'24) just got featured in Jensen Huang's keynote at Nvidia GTC! This marks our third major breakthrough in LLM inference after PagedAttention (vLLM) and Lookahead Decoding — pushing the frontier yet again! Since we post the

thumb_up_off_alt142

chat_bubble_outline7

repeat23

shareShare

kitze ⛴️

@thekitze

5 months ago

omfg

thumb_up_off_alt136,136K

chat_bubble_outline639

repeat7,7K

shareShare

Zirui Liu

@ziruirayliu

4 months ago

This is one of the best "quant." (it is lossless!) paper I have seen in this year!

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Qwen

@alibaba_qwen

4 months ago

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

thumb_up_off_alt7,7K

chat_bubble_outline316

repeat1,1K

shareShare

Minghao Yan

@minghao__yan

4 months ago

I will be presenting our work, Decoding Speculative Decoding, at NAACL tomorrow. We identified the performance bottleneck in speculative decoding to be draft model depth and demonstrated low correlation between language modeling performance and token acceptance rate.

I will be presenting our work, Decoding Speculative Decoding, at <a href="/naacl/">NAACL</a> tomorrow. We identified the performance bottleneck in speculative decoding to be draft model depth and demonstrated low correlation between language modeling performance and token acceptance rate.

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Zirui Liu

@ziruirayliu

3 months ago

There are many fast Matmul Algorithm that faster than the standard N^3 complexity. In practice, the biggest problem of them is (1) numerical unstable (2) cache unfriendly.

thumb_up_off_alt6

chat_bubble_outline2

repeat0

shareShare

Huawei Lin

@huaweilin7

3 months ago

Why does autoregressive (AR) image generation still suck — even in the era of LLMs? The core issue might be the Visual Tokenizer (VT). We introduce VTBench, a comprehensive benchmark designed to evaluate 20+ state-of-the-art VTs in the context of AR image generation. Our

thumb_up_off_alt4

chat_bubble_outline1

repeat3

shareShare

Saining Xie

@sainingxie

3 months ago

Had a great time at this CVPR community-building workshop---lots of fun discussions and some really important insights for early-career researchers. I also gave a talk on "Research as an Infinite Game." Here are the slides: canva.com/design/DAGp0iR…

thumb_up_off_alt347

chat_bubble_outline17

repeat60

shareShare

Zirui Liu

@ziruirayliu

3 months ago

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,

thumb_up_off_alt93

chat_bubble_outline3

repeat21

shareShare

Zirui Liu

@ziruirayliu

2 months ago

Seems like the numerical precision not only hurts the inference, it also has impacts to RL training, too. The fix proposed in MiniMAX is using FP32 for LM head layer. Actually I tried this trick before (and also FP32 KV Cache). They indeed alleivate the problem, but still have

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Zhihao Jia

@jiazhihao

2 months ago

📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at mlsys.org. We’re also thrilled to announce that #MLSys2026 will be held in Seattle next May—submissions open next month with a deadline of Oct 30. We look forward to

thumb_up_off_alt101

chat_bubble_outline2

repeat30

shareShare

Wentao Guo

@wentaoguo7

2 months ago

🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With Ted Zadouri and Tri Dao

thumb_up_off_alt316

chat_bubble_outline11

repeat66

shareShare

murat 🍥

@mayfer

a month ago

why are they like this

thumb_up_off_alt2,2K

chat_bubble_outline67

repeat80

shareShare

Denghui Zhang

@denghui_zhang

a month ago

We’re grateful that our recent work on the Theory-of-Mind of LLMs was featured by MIT Technology Review China 🙏 🔗 mittrchina.com/news/detail/15… In two new preprints, we explore: Sparsity and ToM: How extremely sparse patterns in LLMs shape their ability to infer others’ beliefs and

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Zirui Liu

@ziruirayliu

24 days ago

Glad to see the smart approch! The precision mismatch is really a big headache in the inference and RL system.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Forbes

@forbes

21 days ago

This computer science professor became a billionaire launching four startups out of his privately-funded research lab, including unicorns Databricks and Anyscale. But it’s never been just about business. (Photo: Timothy Archibald for Forbes) trib.al/xyBDRVN

thumb_up_off_alt381

chat_bubble_outline18

repeat74

shareShare

Tianqi Chen

@tqchenml

13 days ago

MLSys infrastructure (compilers, inference engines, runtimes, GPU accelerations, and more) is at the heart of the AI revolution today, and AI has the potential to empower the system revolution itself. #MLSys2026 launches inaugural industry track, consider submit your paper!

thumb_up_off_alt116

chat_bubble_outline0

repeat20

shareShare

Denghui Zhang

@denghui_zhang

7 days ago

Interpretability: Understanding how AI models think youtu.be/fGKNUvivvnc?si… via YouTube Anthropic Anthropic’s new video dives into AI interpretability—how models think & why it matters 🧠✨ Our EMNLP paper SafeSwitch takes a similar path: leveraging internal activations

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Zirui Liu

Zirui Liu

Hao Zhang

kitze ⛴️

Zirui Liu

Qwen

Minghao Yan

Zirui Liu

Huawei Lin

Saining Xie

Zirui Liu

Zirui Liu

Zhihao Jia

Wentao Guo

murat 🍥

Denghui Zhang

Zirui Liu

Forbes

Tianqi Chen

Denghui Zhang