
Julia Kempe
@kempelab
Silver Professor at NYU Courant and CDS, Research Scientist at FAIR
Research in Machine Learning, past in Quantum Computing & Finance. Posts my own.
ID: 1782793458027036672
23-04-2024 15:27:58
115 Tweet
1,1K Takipçi
124 Takip Edilen

PILAF (Policy-Interpolated Learning for Aligned Feedback): our response sampling scheme that provably aligns LLM preference learning w maximizing the underlying oracle reward! arxiv.org/abs/2502.04270 Yunzhen Feng Ariel Kwiatkowski Kunhao Zheng Yaqi Duan AI at Meta NYU Center for Data Science
