Rishabh Agarwal (@agarwl_) Twitter Tweets • TwiCopy

Rishabh Agarwal

@agarwl_

+ Follow

Research Scientist @AIatMeta, Adjunct Prof @McGillU. Prev at @GoogleDeepMind, Goog Brain, Mila, IIT Bombay. Reinforcement Learner. NeurIPS Best Paper (RLiable)

ID: 726727268391878656

linkhttps://agarwl.github.io calendar_today01-05-2016 10:57:37

1,1K Tweet

9,9K Takipçi

722 Takip Edilen

Arian Hosseini

@ariantbd

7 months ago

New Paper! 📣 RL^V: a unified RL & generative verifier —boosts MATH accuracy by 20% and improves both sequential and parallel test-time scaling ☑️ improves out-of-domain and easy-to-hard generalization ☑️ allows dynamic allocation of compute for harder problems How? 👇🏻

thumb_up_off_alt41

chat_bubble_outline0

repeat5

shareShare

Rishabh Agarwal

@agarwl_

7 months ago

Idea: Merging generative verification and solution generarion during RL training of LLM reasoners. Why? This allows you to scale inference compute both sequentially (long CoT) and parallel (Best of N, weighted maj voting). Next? Generation and verification to be trained end

thumb_up_off_alt164

chat_bubble_outline1

repeat24

shareShare

Prime Intellect

@primeintellect

7 months ago

Releasing INTELLECT-2: We’re open-sourcing the first 32B parameter model trained via globally distributed reinforcement learning: • Detailed Technical Report • INTELLECT-2 model checkpoint primeintellect.ai/blog/intellect…

thumb_up_off_alt1,1K

chat_bubble_outline52

repeat306

shareShare

1a3orn

@1a3orn

7 months ago

I suspect this paper's result have been oversold somewhat. As far as I can tell, nothing in the paper excludes the possibility that a quite large % of of the "learning" here is just "learns to be put answers in \\boxed{...}" tags.

$I suspect this paper's result have been oversold somewhat. As far as I can tell, nothing in the paper excludes the possibility that a quite large % of of the "learning" here is just "learns to be put answers in \\boxed{...}" tags.$

thumb_up_off_alt319

chat_bubble_outline10

repeat24

shareShare

rohan anil

@_arohan_

7 months ago

Someone passed this wisdom to me today. Deep learning techniques working vs not working is two devils - your prior about the technique - your attention to details about implementation of the technique Need both to make it work.

thumb_up_off_alt146

chat_bubble_outline1

repeat8

shareShare

Rishabh Agarwal

@agarwl_

7 months ago

All you often need is just one lucky break. For me, it was Geoffrey Hinton who took a bet on me about 7 years ago. He said something along the following lines that stuck with me: “You have tried a bunch of interesting research directions , and all of them failed — that’s what

thumb_up_off_alt467

chat_bubble_outline4

repeat13

shareShare

Lior⚡

@lioronai

7 months ago

An undergrad student broke a 40-year-old belief in computer science. Since 1985, it was believed that hash tables, when nearly full, must check many spots to find or add data. Andrew Krapivin discovered a new way to organize data inside a hash table that avoids this slowdown.

thumb_up_off_alt5,5K

chat_bubble_outline85

repeat411

shareShare

Mingyang Wang ✈️ ACL 2025

@mingyang2666

6 months ago

🚨 New preprint: Language Mixing in Reasoning Language Models Why do RLMs switch languages mid-reasoning, does it help? 💡Language mixing increases with task difficulty, varies by subject, and reflects internal script bias. 💡Controlling reasoning languages boosts performance!

thumb_up_off_alt72

chat_bubble_outline3

repeat7

shareShare

Nathan Lambert

@natolambert

6 months ago

The reason recent RLVR papers show mostly formatting and not learning new skills is just because no one has scaled up enough. If RL compute is <.1% of overall compute, ofc not much changes. I bet o3 is closer to 5% of total compute. 10-25% i bet the models feel different again.

thumb_up_off_alt355

chat_bubble_outline20

repeat31

shareShare

Pablo Samuel Castro

@pcastr

6 months ago

Mind the GAP! we've had a few works proposing techniques for enabling scaling in deep rl, such as MoEs, tokenization, & sparse training. Ghada Sokar and i looked further & found a bit more clarity into *what* enables scaling, leading us to simpler solutions (see GAP in figure)! 1/

Mind the GAP!

we've had a few works proposing techniques for enabling scaling in deep rl, such as MoEs, tokenization, & sparse training.
<a href="/g_sokar/">Ghada Sokar</a> and i looked further & found a bit more clarity into *what* enables scaling, leading us to simpler solutions (see GAP in figure)!
1/

thumb_up_off_alt120

chat_bubble_outline6

repeat18

shareShare

Rishabh Agarwal

@agarwl_

6 months ago

For Deepseek-R1, I am trying to estimate how much training compute was used for RL vs pre-training. My current estimate is 12-18% of pre-training compute was used for RL training! Is this estimate off? Pre-training: 2788K H800 hours for Deepseek-V3 Base. BS = 6144 with 4K seq

thumb_up_off_alt158

chat_bubble_outline5

repeat4

shareShare

Rulin Shao

@rulinshao

6 months ago

100% agree! In our recent work, we show RLVR can even work with random rewards on Qwen2.5-Math. However, all these surprising phenomena are more of an artifact of certain models--not generalizable to models with different prior, also unlikely large scale🤔 x.com/StellaLisy/sta…

thumb_up_off_alt85

chat_bubble_outline0

repeat7

shareShare

Dimitris Papailiopoulos

@dimitrispapail

6 months ago

Random labels helping makes sense when P(correct|random)>0 since the x-change rate of accuracy is much higher for true positive rather than false negative/false positive examples. Wrong labels working makes little sense, and is a result of undertraining + trajectories being

thumb_up_off_alt80

chat_bubble_outline4

repeat11

shareShare

Sinclair Wang

@sinclairwang1

6 months ago

I believe that we need a deeper understanding of the relationship between pre-training and RL scaling. How to perform pre-training better, making language models more suitable and smooth for RL scaling? That is to say, Pre-training for RL. If you are interested in it, welcome to

thumb_up_off_alt204

chat_bubble_outline2

repeat33

shareShare

Ganqu Cui

@charlesfornlp

6 months ago

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔 Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵

thumb_up_off_alt125

chat_bubble_outline3

repeat16

shareShare

Krishna Mohan

@kmohan2006

6 months ago

a great video by Jia-Bin Huang explaining kl divergence and its computation

a great video by <a href="/jbhuang0604/">Jia-Bin Huang</a> explaining kl divergence and its computation

thumb_up_off_alt632

chat_bubble_outline3

repeat54

shareShare