Alexandre Ramé (@ramealexandre) Twitter Tweets • TwiCopy

Nathan Lambert

6 months ago

The reason recent RLVR papers show mostly formatting and not learning new skills is just because no one has scaled up enough. If RL compute is <.1% of overall compute, ofc not much changes. I bet o3 is closer to 5% of total compute. 10-25% i bet the models feel different again.

thumb_up_off_alt355

chat_bubble_outline20

repeat31

shareShare

AshutoshShrivastava

@ai_for_success

6 months ago

WHY IS NO ONE TALKING ABOUT THIS?? Gemma 3n model was one of the best surprises for me. The fact that you can run it on edge devices even with just 2GB of RAM is impressive. A few weeks back, I was on holiday and used the Gemini Live feature a lot. But I kept running into

thumb_up_off_alt934

chat_bubble_outline82

repeat88

shareShare

YIFENG LIU

@yifengliu_ai

6 months ago

1/6 We introduce RPG, a principled framework for deriving and analyzing KL-regularized policy gradient methods, unifying GRPO/k3-estimator and REINFORCE++ under this framework and discovering better RL objectives than GRPO: Paper: arxiv.org/abs/2505.17508 Code:

thumb_up_off_alt190

chat_bubble_outline5

repeat35

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

6 months ago

Reinforcing General Reasoning without Verifiers "we propose a verifier-free method (VeriFree) that bypasses answer verification and instead uses RL to directly maximize the probability of generating the reference answer. We compare VeriFree with verifier-based methods and

thumb_up_off_alt310

chat_bubble_outline6

repeat50

shareShare

AK

@_akhaliq

6 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

thumb_up_off_alt261

chat_bubble_outline1

repeat35

shareShare

Ning Ding

@stingning

6 months ago

Language models are trading Entropy for Rewards in reinforcement learning, meaning the uncertainty is transforming to certainty. The trading is even quantitively predictable: R = -a * exp(H) + b In our latest paper, we find that we should, and we can scientifically

thumb_up_off_alt494

chat_bubble_outline9

repeat71

shareShare

Omar Khattab

@lateinteraction

6 months ago

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long

thumb_up_off_alt573

chat_bubble_outline18

repeat79

shareShare

Emmanuel Macron

@emmanuelmacron

6 months ago

« Champion mon frère ! » Jour de gloire pour le PSG ! Bravo, nous sommes tous fiers. Paris, capitale de l’Europe ce soir.

thumb_up_off_alt59,59K

chat_bubble_outline4,4K

repeat5,5K

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

RL scaling is here arxiv.org/pdf/2505.24864

thumb_up_off_alt789

chat_bubble_outline16

repeat118

shareShare

Mustafa Shukor

@mustafashukor1

6 months ago

The Worldwide LeRobot hackathon is in 2 weeks, and we have been cooking something for you… Introducing SmolVLA, a Vision-Language-Action model with light-weight architecture, pretrained on community datasets, with an asynchronous inference stack, to control robots🧵

The Worldwide <a href="/LeRobotHF/">LeRobot</a> hackathon is in 2 weeks, and we have been cooking something for you…
Introducing SmolVLA, a Vision-Language-Action model with light-weight architecture, pretrained on community datasets, with an asynchronous inference stack, to control robots🧵

thumb_up_off_alt429

chat_bubble_outline6

repeat79

shareShare

Andrei Bursuc

@abursuc

5 months ago

CVPR@Paris: what a fantastic and refreshing event this was: awesome talks and poster, great researchers, amazing food and fantastic organizers. Let's do it again Vicky Kalogeiton David Picard Matthieu Cord #cvprinparis #CVPR2025

thumb_up_off_alt42

chat_bubble_outline0

repeat2

shareShare

Qingxiu Dong

@qx_dong

5 months ago

⏰ We introduce Reinforcement Pre-Training (RPT🍒) — reframing next-token prediction as a reasoning task using RLVR ✅ General-purpose reasoning 📑 Scalable RL on web corpus 📈 Stronger pre-training + RLVR results 🚀 Allow allocate more compute on specific tokens

thumb_up_off_alt920

chat_bubble_outline28

repeat147

shareShare

Omar Sanseviero

@osanseviero

5 months ago

Gemma 3n in desktop is here! 🚀 🤗Desktop (Mac/Windows/Linux) + IoT 🔥2B and 4B 🧠Powered by new LiteRT-LM library Github: github.com/google-ai-edge… Preview: huggingface.co/google/gemma-3…

thumb_up_off_alt840

chat_bubble_outline17

repeat155

shareShare

Arcee.ai

@arcee_ai

5 months ago

Another research report, this time from Massachusetts Institute of Technology (MIT), confirms the unique benefits of model merging! In a paper recently published on Nature.com (nature.com/articles/s4152…), researchers at the Massachusetts Institute of Technology explore a wide range of strategies to adapt

Another research report, this time from <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a>, confirms the unique benefits of model merging!

In a paper recently published on Nature.com (nature.com/articles/s4152…), researchers at the Massachusetts Institute of Technology explore a wide range of strategies to adapt

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Omar Sanseviero

@osanseviero

5 months ago

Gemma 3n is the first model with less than 10B parameters with a LMArena score above 1300 🔥 And yes, you can run it in your phone Try it in AIS aistudio.google.com/prompts/new_ch… AI Edge: github.com/google-ai-edge…

thumb_up_off_alt667

chat_bubble_outline23

repeat70

shareShare

Nathan Lambert

@natolambert

5 months ago

A common trend across recent research in using reinforcement learning to train reasoning models is that the clipping operation within a trust region (core to PPO, adopted by GRPO) is squashing rare tokens that are key to clever behaviors like verification or backtracking. The

thumb_up_off_alt413

chat_bubble_outline4

repeat52

shareShare

Philipp Schmid

@_philschmid

5 months ago

Gemini 2.5 is production ready! We just launched 3 new Gemini models with 2.5 Pro and Flash being now generally available and a new Gemini 2.5 Flash Lite preview! 🧠⚡️🔦 Here is all you need to know: 🔦 New Gemini 2.5 Flash Lite (Preview) with Thinking, 1M context, only

thumb_up_off_alt337

chat_bubble_outline14

repeat40

shareShare

Google DeepMind

@googledeepmind

5 months ago

Hot Gemini updates off the press. 🚀 Anyone can now use 2.5 Flash and Pro to build and scale production-ready AI applications. 🙌 We’re also launching 2.5 Flash-Lite in preview: the fastest model in the 2.5 family to respond to requests, with the lowest cost too. 🧵

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat125

shareShare

Paul Couvert

@itspaulai

5 months ago

Google has just released Gemini 2.5 Flash Lite This is the cheapest and fastest model available: You can literally: - Process the entire Harry Potter series for $0.22 - Analyze a 3-hour video for less than $0.35 And you can also enable thinking mode to enhance its

thumb_up_off_alt647

chat_bubble_outline27

repeat60

shareShare

Aran Komatsuzaki

@arankomatsuzaki

5 months ago

Gemini 2.5 tech report was released!

thumb_up_off_alt499

chat_bubble_outline15

repeat86

shareShare