Ximing Lu (@gximing) 's Twitter Profile
Ximing Lu

@gximing

PhD @uwcse @uwnlp.

ID: 965093642456023040

linkhttps://gloriaximinglu.github.io/ calendar_today18-02-2018 05:20:29

100 Tweet

791 Takipçi

215 Takip Edilen

Ximing Lu (@gximing) 's Twitter Profile Photo

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

With the rise of R1, search seems out of fashion? We prove the opposite! 😎

Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.
Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

Finetuning on raw DeepSeek R1 reasoning traces makes models overthink. One of our early s1 versions was overthinking so much, it questioned the purpose of math when just asking what's 1+1😁 Retro-Search by Ximing Lu & team reduces overthinking + improves performance!

David Acuna (@davidjesusacu) 's Twitter Profile Photo

What if longer reasoning isn’t always better, and blind shortening doesn’t always work? In our latest work, we use search as an effective means to reduce both overthinking and underthinking, synthesizing reasoning trajectories that are efficient and insightful. Check it out! 👇

Hyunwoo Kim (@hyunw_kim) 's Twitter Profile Photo

Humans backtrack where we should've made a better decision. How do we do this? We search and simulate alternative paths that might have led to better outcomes. Our🌈RETRO-Search mimics this process, empowering models to achieve SOTA performance AND efficient reasoning in math🌟

The AI Timeline (@theaitimeline) 's Twitter Profile Photo

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning Author's Explanation: x.com/GXiming/status… While distilling reasoning paths from large models can boost smaller models, these paths are often inefficient. Retro-Search is a new algorithm designed to

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

Author's Explanation:
x.com/GXiming/status…

While distilling reasoning paths from large models can boost smaller models, these paths are often inefficient. Retro-Search is a new algorithm designed to
Johannes Hagemann (@johannes_hage) 's Twitter Profile Photo

if no one else is showing that RL isn't just eliciting latent behavior already learned in pretraining, but is actually a new scaling paradigm, nvidia has to do it themselves

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering
Jaehun Jung (@jaehunjung_com) 's Twitter Profile Photo

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔

𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵
Ximing Lu (@gximing) 's Twitter Profile Photo

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.
David Acuna (@davidjesusacu) 's Twitter Profile Photo

What kind of data diversity helps reasoning models to generalize better — and how can we get more of it 🧐? 👇 Read on to see what we found! ✨

Jaewoo Ahn (@ahnjaewoo2) 's Twitter Profile Photo

🚨New Paper Alert🚨 Excited to share our new video game benchmark, "Orak"! 🕹️ It was a thrilling experience to test whether LLM/VLM agents can solve real video games 🎮 Looking forward to continuing my research on LLM/VLM-based game agents with Krafton AI !