
Rishabh Agarwal
@agarwl_
Research Scientist @AIatMeta, Adjunct Prof @McGillU. Prev at @GoogleDeepMind, Goog Brain, Mila, IIT Bombay. Reinforcement Learner. NeurIPS Best Paper (RLiable)
ID: 726727268391878656
https://agarwl.github.io 01-05-2016 10:57:37
1,1K Tweet
9,9K Followers
722 Following

New Paper! ๐ฃ RL^V: a unified RL & generative verifier โboosts MATH accuracy by 20% and improves both sequential and parallel test-time scaling โ๏ธ improves out-of-domain and easy-to-hard generalization โ๏ธ allows dynamic allocation of compute for harder problems How? ๐๐ป





All you often need is just one lucky break. For me, it was Geoffrey Hinton who took a bet on me about 7 years ago. He said something along the following lines that stuck with me: โYou have tried a bunch of interesting research directions , and all of them failed โ thatโs what



Mind the GAP! we've had a few works proposing techniques for enabling scaling in deep rl, such as MoEs, tokenization, & sparse training. Ghada Sokar and i looked further & found a bit more clarity into *what* enables scaling, leading us to simpler solutions (see GAP in figure)! 1/






