
Yunzhen Feng
@feeelix_feng
PhD at CDS, NYU. Ex-Intern at FAIR @AIatMeta. Previously undergrad at @PKU1898
ID: 1523345547565879298
08-05-2022 16:54:28
87 Tweet
326 Takipรงi
588 Takip Edilen







Check out our poster tmr at 10am at the ICLR Bidirectional Human-AI Alignment workshop! We cover how on-policy preference sampling can be biased and our optimal response sampling for human labeling. NYU Center for Data Science AI at Meta Julia Kempe Yaqi Duan x.com/feeelix_feng/sโฆ


๐จ Your RL only improves ๐ฝ๐ฎ๐๐@๐ญ, not ๐ฝ๐ฎ๐๐@๐ธ? ๐จ Thatโs not a bug โ itโs a ๐ณ๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐ผ๐ฏ๐ท๐ฒ๐ฐ๐๐ถ๐๐ฒ youโre optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. ๐งต How?
