Rahul Madhavan (@imrahulmaddy) Twitter Tweets • TwiCopy

Rishabh Agarwal

19 days ago

Q: Does the argument here assume that each bit of information (e.g., binary reward) would cause similar change in weights? Change in weights likely has to do with how surprising this bit was: a good solution our LLM would have sampled 1 out of 10 times vs 1 out of a million

thumb_up_off_alt166

chat_bubble_outline6

repeat6

shareShare

Etash Guha @ ICLR

@etash_guha

18 days ago

IMO PPO/GRPO won’t be enough to train good open source agents and probably aren’t what frontier labs use. Main issue: online RL is too inefficient in terms of compute. In every batch, you have to wait for all generations to finish, and modern day agentic rollouts can take 5 to 10

thumb_up_off_alt164

chat_bubble_outline6

repeat5

shareShare

Jubayer Ibn Hamid

@jubayer_hamid

18 days ago

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat135

shareShare

Nouha Dziri

@nouhadziri

18 days ago

🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔 In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat163

shareShare

Time Capsule Tales

@timecaptales

18 days ago

Rescued chimpanzee thanks Jane Goodall by giving her a hug

thumb_up_off_alt242,242K

chat_bubble_outline1,1K

repeat21,21K

shareShare

Rahul Madhavan

@imrahulmaddy

18 days ago

Humans are born with a high amount of inductive bias. This is why we find vision easy vs math hard. Our neural pathways are well defined before we start using them. The precursor to object detection is well defined. Math on the other hand was not required by evolution. Hence it

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Rahul Madhavan

@imrahulmaddy

18 days ago

The only good take in this entire debate. TODO: Distill the learning algorithm apart from the knowledge of humans into the initial training, but LLMs are a good architectural substrate to do that.

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Wes Roth

@wesrothmoney

18 days ago

AlphaEvolve Just Helped Prove New Theorems in Complexity Theory Google DeepMind's AlphaEvolve just made real breakthroughs in theoretical computer science. Instead of generating full proofs, it discovered new combinatorial structures that plug into existing proof frameworks,

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat233

shareShare

Konstantin Mishchenko

@konstmish

17 days ago

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent. The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do?

thumb_up_off_alt472

chat_bubble_outline11

repeat55

shareShare

Jelani Nelson

@minilek

16 days ago

I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini

thumb_up_off_alt291

chat_bubble_outline14

repeat18

shareShare

Rahul Madhavan

@imrahulmaddy

16 days ago

ouch.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Middle East Eye

@middleeasteye

16 days ago

A new wave of flotillas are sailing towards Gaza as one vessel, the Conscience, carries 92 participants, including paramedics and journalists from 26 countries. After being obstructed in April and struck by drones near Malta earlier this year, the Conscience sets its sights on

thumb_up_off_alt11,11K

chat_bubble_outline208

repeat4,4K

shareShare

Antoine Moulin

@antoine_mln

16 days ago

This is a bad take. I thought I would ignore it, but I'm done with my deadlines, it's Friday, 3 am (well, 5), so why not? Short thread on why offline RL is, in fact, RL (🤯). 🧵

thumb_up_off_alt310

chat_bubble_outline8

repeat27

shareShare

Guy Ohayon

@guy__ohayon

13 days ago

The Mahalanobis distance is the natural metric for Gaussian signals. But how can it be generalized to arbitrary probability densities? And how should a solution be tested? We address these questions in a new paper with Pierre-Étienne Fiquet Florentin Guth Jona Ballé, and Eero Simoncelli

thumb_up_off_alt276

chat_bubble_outline5

repeat31

shareShare

Rahul Madhavan

@imrahulmaddy

13 days ago

Counterfactual reasoning is the following; Take the current model and look at its implications on the real world. Now can you come up with all other models which reach the same conclusions? String theory is one Counterfactual model. Are there others? Do some of them make

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Rahul Madhavan

@imrahulmaddy

13 days ago

A true doyen of NLP in India. Heard so much about him through his students.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Alexia Jolicoeur-Martineau

@jm_alexia

12 days ago

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat220

shareShare

Rishabh Agarwal

@agarwl_

12 days ago

I started playing a bit around with Tinker for RL runs on Qwen3 models and one thing I'm impressed by is the small KL discrepancy between the generator and trainer across dense and MoE models. This is 10x smaller than what I typically observe for Qwen dense models if I were to

thumb_up_off_alt320

chat_bubble_outline9

repeat19

shareShare

Sundar Pichai

@sundarpichai

12 days ago

Congrats to Michel Devoret, John Martinis, and John Clarke on the Nobel Prize in Physics. 🔬🥼 Michel is chief scientist of hardware at our Quantum AI lab and John Martinis led the hardware team for many years. Their pioneering work in quantum mechanics in the 1980s made recent

thumb_up_off_alt8,8K

chat_bubble_outline148

repeat740

shareShare

Ethan Mollick

@emollick

12 days ago

People keep talking about innovation from Bell Labs, which produced 18 Nobel Prize winners (11 Prizes) over 100 years. But Google now has produced 6 Nobel winners (3 Prizes) in less than 30 years & was not subsidized by a government-enforced monopoly .

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat120

shareShare