Rafael Rafailov @ NeurIPS (@rm_rafailov) 's Twitter Profile
Rafael Rafailov @ NeurIPS

@rm_rafailov

Ph.D. Student at @StanfordAILab. I work on Foundation Models and Decision Making. Previously @GoogleDeepMind @UCBerkeley

ID: 1660344669916786688

linkhttps://rmrafailov.github.io/ calendar_today21-05-2023 18:11:57

1,1K Tweet

6,6K Followers

776 Following

Rafael Rafailov @ NeurIPS (@rm_rafailov) 's Twitter Profile Photo

We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment

We have a new preprint out - your language model is not a reward, it’s a Q function!
1. The likelihood of the preferred answer must go down - it’s a policy divergence
2. MCTS guided decoding on language is equivalent to likelihood search on DPO
3. DPO learns credit assignment