
Parishad BehnamGhader
@parishadbehnam
NLP PhD student at @Mila_Quebec and @mcgillu
ID: 828506588902256640
http://parishadbehnam.github.io 06-02-2017 07:32:18
56 Tweet
145 Takipçi
99 Takip Edilen








Current KL estimation practices in RLHF can generate high variance and even negative values! We propose a provably better estimator that only takes a few lines of code to implement.🧵👇 w/ Tim Vieira and Ryan Cotterell code: arxiv.org/pdf/2504.10637 paper: github.com/rycolab/kl-rb




