Costa Huang
@vwxyzjn
RLHF @allen_ai; main dev of @cleanrl_lib; CS PhD @DrexelUniv; Ex @huggingface @CuraiHQ @weights_biases @NVIDIAAI @riotgames.
ID: 1238049606
https://costa.sh 03-03-2013 06:26:46
1,1K Tweet
5,5K Followers
1,1K Following
Happy to share our work on reproducing RLHF scaling behaviors in OpenAI's work in summarizing from feedback. We built an RLHF pipeline from scratch and enumerated over 20+ implementation details π Fun collab with Michael Noukhovitch @NeurIPS 2024, Arian Hosseini @ NeurIPS, Kashif Rasul, wang, and Lewis Tunstall π