
Aviral Kumar
@aviral_kumar2
Assistant Professor of CS & ML at @CarnegieMellon. Part-time Research Scientist Google. PhD from UC Berkeley.
ID: 737487375648100352
http://aviralkumar2907.github.io 31-05-2016 03:34:27
294 Tweet
4,4K Takipçi
345 Takip Edilen

Given the confusion around what RL does for reasoning in LLMs, Amrith Setlur & I wrote a new blog post on when RL simply sharpens the base model & when it discovers new reasoning strategies. Learn how to measure discovery + methods to enable it ⬇️ tinyurl.com/rlshadis