Ziteng Sun (@sziteng) 's Twitter Profile
Ziteng Sun

@sziteng

Responsible and efficient AI.
Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

ID: 3020905377

linkhttp://zitengsun.com calendar_today06-02-2015 03:04:03

67 Tweet

428 Followers

388 Following

Ziteng Sun (@sziteng) 's Twitter Profile Photo

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.

Can we align our model to better suit a given inference-time
Beidi Chen (@beidichen) 's Twitter Profile Photo

⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and

Hongyang Zhang (@hongyangzh) 's Twitter Profile Photo

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x🚀than vanilla (on HF) - 1.4x🚀than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x🚀in latency even for large bs=64 (on SGLang) - A new

Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
│
🗓️ Deadline: May 19, 2025
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

Happening now at poster E-2804. Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling

Happening now at poster E-2804. 

Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt. Let me elaborate on what I mean by that and a cheaper way of doing it offline.

The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt.

Let me elaborate on what I mean by that and a cheaper way of doing it offline.