
Nimit Kalra
@qw3rtman
research @haizelabs, aligning rewards. ex @citadel @utaustin
$ pip install verdict
ID: 385428300
https://nimit.io/ 05-10-2011 13:50:20
99 Tweet
781 Followers
2,2K Following

Excited to discuss "SFT Memorizes, RL Generalizes" tomorrow at Haize Labs's NYC AI Reading Group with Leonard Tang and will brown! We'll also explore a broader theme — "what does RL actually learn?", guided by some related works from the past week.
