
Ziteng Sun
@sziteng
Responsible and efficient AI.
Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell
ID: 3020905377
http://zitengsun.com 06-02-2015 03:04:03
67 Tweet
428 Takipรงi
388 Takip Edilen


โฐ๐ขAfter years of working on long-context efficiency, Iโve started to doubt if itโs truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trickโfaster, cheaper, and

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x๐than vanilla (on HF) - 1.4x๐than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x๐in latency even for large bs=64 (on SGLang) - A new

Today at 10am I will present Ziteng Sun's paper "block verification accelerates speculative decoding"



