@aviral_kumar2 : Given the confusion around what RL does for reasoning in LLMs, @setlur_amrith & I wrote a new blog post on when RL simply sharpens the base model & when it discovers new reasoning strategies. Learn how to measure discovery + methods to enable it ⬇️ tinyurl.com/rlshadis • TwiCopy

Aviral Kumar

@aviral_kumar2

+ Follow

Assistant Professor of CS & ML at @CarnegieMellon. Part-time Research Scientist Google. PhD from UC Berkeley.

ID: 737487375648100352

linkhttp://aviralkumar2907.github.io calendar_today31-05-2016 03:34:27

294 Tweet

4,4K Takipçi

345 Takip Edilen

Aviral Kumar

@aviral_kumar2

a month ago

Given the confusion around what RL does for reasoning in LLMs, Amrith Setlur & I wrote a new blog post on when RL simply sharpens the base model & when it discovers new reasoning strategies. Learn how to measure discovery + methods to enable it ⬇️ tinyurl.com/rlshadis

thumb_up_off_alt246

chat_bubble_outline4

repeat33

shareShare