@sziteng : Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time • TwiCopy

Ziteng Sun

@sziteng

+ Follow

Responsible and efficient AI.
Topics: LLM efficiency; LLM alignment; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell

ID: 3020905377

linkhttp://zitengsun.com calendar_today06-02-2015 03:04:03

67 Tweet

428 Takipçi

388 Takip Edilen

Ziteng Sun

@sziteng

6 months ago

Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch. Can we align our model to better suit a given inference-time

thumb_up_off_alt250

chat_bubble_outline5

repeat51

shareShare