Andrej Karpathy (@karpathy) 's Twitter Profile
Andrej Karpathy

@karpathy

Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets ๐Ÿง ๐Ÿค–๐Ÿ’ฅ

ID: 33836629

linkhttps://karpathy.ai calendar_today21-04-2009 06:49:15

9,9K Tweet

1,0M Followers

931 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

# RLHF is just barely RL

Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely