@karpathy : # RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely • TwiCopy

Andrej Karpathy

@karpathy

+ Follow

Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥

ID: 33836629

linkhttps://karpathy.ai calendar_today21-04-2009 06:49:15

9,9K Tweet

1,0M Followers

931 Following

Andrej Karpathy

@karpathy

a month ago

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

thumb_up_off_alt8,8K

chat_bubble_outline387

repeat1,1K

shareShare