Yunzhen Feng (@feeelix_feng) 's Twitter Profile
Yunzhen Feng

@feeelix_feng

PhD at CDS, NYU. Ex-Intern at FAIR @AIatMeta. Previously undergrad at @PKU1898

ID: 1523345547565879298

calendar_today08-05-2022 16:54:28

87 Tweet

326 Followers

588 Following

Yunzhen Feng (@feeelix_feng) 's Twitter Profile Photo

You think on-policy sampling gives the best reward models? Think again! 🔥 Our finding: Even with on-policy data, reward models misalign with policy optimization goals! Introducing PILAF—strategic sampling that fixes this fundamentally. (1/11)

You think on-policy sampling gives the best reward models? Think again! 🔥
Our finding: Even with on-policy data, reward models misalign with policy optimization goals!
Introducing PILAF—strategic sampling that fixes this fundamentally. (1/11)