Ziteng Sun (@sziteng) 's Twitter Profile
Ziteng Sun

@sziteng

Responsible and efficient AI.
Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

ID: 3020905377

linkhttp://zitengsun.com calendar_today06-02-2015 03:04:03

67 Tweet

428 Takipรงi

388 Takip Edilen

Ziteng Sun (@sziteng) 's Twitter Profile Photo

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.

Can we align our model to better suit a given inference-time
Beidi Chen (@beidichen) 's Twitter Profile Photo

โฐ๐Ÿ“ขAfter years of working on long-context efficiency, Iโ€™ve started to doubt if itโ€™s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trickโ€”faster, cheaper, and

Hongyang Zhang (@hongyangzh) 's Twitter Profile Photo

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x๐Ÿš€than vanilla (on HF) - 1.4x๐Ÿš€than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x๐Ÿš€in latency even for large bs=64 (on SGLang) - A new

Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! ๐Ÿ“ Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! โ”‚ ๐Ÿ—“๏ธ Deadline: May 19, 2025

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

๐Ÿ“ Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
โ”‚
๐Ÿ—“๏ธ Deadline: May 19, 2025
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

Happening now at poster E-2804. Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling

Happening now at poster E-2804. 

Come talk to us about why reward calibration key is to alignment and how to do RLHF for test-time scaling
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt. Let me elaborate on what I mean by that and a cheaper way of doing it offline.

The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt.

Let me elaborate on what I mean by that and a cheaper way of doing it offline.