sam laki (e^-λ)
@samlakig
time to let the cat out of the bag
ID: 1509143643398893569
30-03-2022 12:21:24
59,59K Tweet
4,4K Takipçi
5,5K Takip Edilen
Imitation is the foundation of #LLM training. And it is a #ReinforcementLearning problem! Compared to supervised learning, RL -here inverse RL- better exploits sequential structure, online data and further extracts rewards. Beyond thrilled for our Google DeepMind paper! A
Part 1 of hopefully many: corsix.org/content/tt-wh-… (@[email protected] might tell me if I've made any mistakes).