Ximing Lu (@gximing) Twitter Tweets • TwiCopy

Ximing Lu

8 months ago

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

thumb_up_off_alt250

chat_bubble_outline5

repeat102

shareShare

Niklas Muennighoff

@muennighoff

8 months ago

Finetuning on raw DeepSeek R1 reasoning traces makes models overthink. One of our early s1 versions was overthinking so much, it questioned the purpose of math when just asking what's 1+1😁 Retro-Search by Ximing Lu & team reduces overthinking + improves performance!

thumb_up_off_alt99

chat_bubble_outline1

repeat6

shareShare

David Acuna

@davidjesusacu

8 months ago

What if longer reasoning isn’t always better, and blind shortening doesn’t always work? In our latest work, we use search as an effective means to reduce both overthinking and underthinking, synthesizing reasoning trajectories that are efficient and insightful. Check it out! 👇

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Hyunwoo Kim

@hyunw_kim

8 months ago

Humans backtrack where we should've made a better decision. How do we do this? We search and simulate alternative paths that might have led to better outcomes. Our🌈RETRO-Search mimics this process, empowering models to achieve SOTA performance AND efficient reasoning in math🌟

thumb_up_off_alt33

chat_bubble_outline0

repeat4

shareShare

The AI Timeline

@theaitimeline

7 months ago

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning Author's Explanation: x.com/GXiming/status… While distilling reasoning paths from large models can boost smaller models, these paths are often inefficient. Retro-Search is a new algorithm designed to

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Andrew Zhao

@andrewz45732491

7 months ago

RL scaling is here arxiv.org/pdf/2505.24864

thumb_up_off_alt789

chat_bubble_outline16

repeat118

shareShare

Johannes Hagemann

@johannes_hage

7 months ago

if no one else is showing that RL isn't just eliciting latent behavior already learned in pretraining, but is actually a new scaling paradigm, nvidia has to do it themselves

thumb_up_off_alt151

chat_bubble_outline4

repeat14

shareShare

Nathan Lambert

@natolambert

7 months ago

And this on a 1.5b model :), 136k problems. rl scaling makes us happy

thumb_up_off_alt465

chat_bubble_outline13

repeat37

shareShare

AK

@_akhaliq

7 months ago

Nvidia presents ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

thumb_up_off_alt276

chat_bubble_outline5

repeat45

shareShare

Shizhe Diao

@shizhediao

7 months ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

thumb_up_off_alt382

chat_bubble_outline17

repeat64

shareShare

Jaehun Jung

@jaehunjung_com

7 months ago

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

thumb_up_off_alt175

chat_bubble_outline4

repeat32

shareShare

Ximing Lu

@gximing

7 months ago

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.

thumb_up_off_alt126

chat_bubble_outline1

repeat17

shareShare

David Acuna

@davidjesusacu

7 months ago

What kind of data diversity helps reasoning models to generalize better — and how can we get more of it 🧐? 👇 Read on to see what we found! ✨

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Jaewoo Ahn

@ahnjaewoo2

6 months ago

🚨New Paper Alert🚨 Excited to share our new video game benchmark, "Orak"! 🕹️ It was a thrilling experience to test whether LLM/VLM agents can solve real video games 🎮 Looking forward to continuing my research on LLM/VLM-based game agents with Krafton AI !

thumb_up_off_alt20

chat_bubble_outline1

repeat5

shareShare