Rahul Madhavan (@imrahulmaddy) 's Twitter Profile
Rahul Madhavan

@imrahulmaddy

learning.

PhD at IISc.

ID: 1329481993965375489

linkhttps://wimwian.iima.ac.in/2018/07/17/an-ode-to-the-lone-scientist/ calendar_today19-11-2020 17:49:52

3,3K Tweet

947 Followers

1,1K Following

Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Q: Does the argument here assume that each bit of information (e.g., binary reward) would cause similar change in weights? Change in weights likely has to do with how surprising this bit was: a good solution our LLM would have sampled 1 out of 10 times vs 1 out of a million

Etash Guha @ ICLR (@etash_guha) 's Twitter Profile Photo

IMO PPO/GRPO won’t be enough to train good open source agents and probably aren’t what frontier labs use. Main issue: online RL is too inefficient in terms of compute. In every batch, you have to wait for all generations to finish, and modern day agentic rollouts can take 5 to 10

Jubayer Ibn Hamid (@jubayer_hamid) 's Twitter Profile Photo

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks
Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔 In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!

🚀Ever wondered how to make RL work on impossible hard tasks where pass@k = 0%? 🤔

In our new work, we share the RL Grokking Recipe: a training recipe that enables LLMs to solve previously unsolvable coding problems! I will be at #CoLM2025 next week so happy to chat about it!
Rahul Madhavan (@imrahulmaddy) 's Twitter Profile Photo

Humans are born with a high amount of inductive bias. This is why we find vision easy vs math hard. Our neural pathways are well defined before we start using them. The precursor to object detection is well defined. Math on the other hand was not required by evolution. Hence it

Rahul Madhavan (@imrahulmaddy) 's Twitter Profile Photo

The only good take in this entire debate. TODO: Distill the learning algorithm apart from the knowledge of humans into the initial training, but LLMs are a good architectural substrate to do that.

Wes Roth (@wesrothmoney) 's Twitter Profile Photo

AlphaEvolve Just Helped Prove New Theorems in Complexity Theory Google DeepMind's AlphaEvolve just made real breakthroughs in theoretical computer science. Instead of generating full proofs, it discovered new combinatorial structures that plug into existing proof frameworks,

AlphaEvolve Just Helped Prove New Theorems in Complexity Theory

Google DeepMind's AlphaEvolve just made real breakthroughs in theoretical computer science. 

Instead of generating full proofs, it discovered new combinatorial structures that plug into existing proof frameworks,
Konstantin Mishchenko (@konstmish) 's Twitter Profile Photo

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent. The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do?

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent.
The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do?
Jelani Nelson (@minilek) 's Twitter Profile Photo

I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini

Middle East Eye (@middleeasteye) 's Twitter Profile Photo

A new wave of flotillas are sailing towards Gaza as one vessel, the Conscience, carries 92 participants, including paramedics and journalists from 26 countries. After being obstructed in April and struck by drones near Malta earlier this year, the Conscience sets its sights on

Antoine Moulin (@antoine_mln) 's Twitter Profile Photo

This is a bad take. I thought I would ignore it, but I'm done with my deadlines, it's Friday, 3 am (well, 5), so why not? Short thread on why offline RL is, in fact, RL (🤯). 🧵

Guy Ohayon (@guy__ohayon) 's Twitter Profile Photo

The Mahalanobis distance is the natural metric for Gaussian signals. But how can it be generalized to arbitrary probability densities? And how should a solution be tested? We address these questions in a new paper with Pierre-Étienne Fiquet Florentin Guth Jona Ballé, and Eero Simoncelli

The Mahalanobis distance is the natural metric for Gaussian signals. But how can it be generalized to arbitrary probability densities? And how should a solution be tested? We address these questions in a new paper with <a href="/pe_fiquet/">Pierre-Étienne Fiquet</a> <a href="/FlorentinGuth/">Florentin Guth</a> Jona Ballé, and <a href="/EeroSimoncelli/">Eero Simoncelli</a>
Rahul Madhavan (@imrahulmaddy) 's Twitter Profile Photo

Counterfactual reasoning is the following; Take the current model and look at its implications on the real world. Now can you come up with all other models which reach the same conclusions? String theory is one Counterfactual model. Are there others? Do some of them make

Alexia Jolicoeur-Martineau (@jm_alexia) 's Twitter Profile Photo

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871

Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

I started playing a bit around with Tinker for RL runs on Qwen3 models and one thing I'm impressed by is the small KL discrepancy between the generator and trainer across dense and MoE models. This is 10x smaller than what I typically observe for Qwen dense models if I were to

I started playing a bit around with Tinker for RL runs on Qwen3 models and one thing I'm impressed by is the small KL discrepancy between the generator and trainer across dense and MoE models. 

This is 10x smaller than what I typically observe for Qwen dense models if I were to
Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Congrats to Michel Devoret, John Martinis, and John Clarke on the Nobel Prize in Physics. 🔬🥼 Michel is chief scientist of hardware at our Quantum AI lab and John Martinis led the hardware team for many years. Their pioneering work in quantum mechanics in the 1980s made recent

Ethan Mollick (@emollick) 's Twitter Profile Photo

People keep talking about innovation from Bell Labs, which produced 18 Nobel Prize winners (11 Prizes) over 100 years. But Google now has produced 6 Nobel winners (3 Prizes) in less than 30 years & was not subsidized by a government-enforced monopoly .