Ziteng Sun (@sziteng) 's Twitter Profile
Ziteng Sun

@sziteng

Responsible and efficient AI.
Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

ID: 3020905377

linkhttp://zitengsun.com calendar_today06-02-2015 03:04:03

67 Tweet

428 Takipçi

388 Takip Edilen

Ziteng Sun (@sziteng) 's Twitter Profile Photo

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.

Can we align our model to better suit a given inference-time