@kareldoostrlnck : Aligning Language Models with preferences leads to stronger and safer models (GPT3 → ChatGPT). However, preferences (RLHF) contain irrelevant signals, and alignment objectives (e.g. DPO) can actually hurt model performance. We tackle both, leading to a ~2x performance boost. • TwiCopy

Karel D’Oosterlinck

@kareldoostrlnck

+ Follow

Alignment, Interpretable AI, RAG, Biomedical NLP. Intern @ContextualAI, PhD student @ugent, visitor @stanfordnlp. Instigator of hikes.

ID: 1107032711477235712

calendar_today16-03-2019 21:35:40

772 Tweet

2,2K Followers

625 Following

Karel D’Oosterlinck

@kareldoostrlnck

a month ago

Aligning Language Models with preferences leads to stronger and safer models (GPT3 → ChatGPT). However, preferences (RLHF) contain irrelevant signals, and alignment objectives (e.g. DPO) can actually hurt model performance. We tackle both, leading to a ~2x performance boost.