Yingru Li (@richardyrli) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Simon Shaolei Du

@simonshaoleidu

7 months ago

Excited to share our work led by Yiping Wang RLVR with only ONE training example can boost 37% accuracy on MATH500.

thumb_up_off_alt49

chat_bubble_outline2

repeat5

shareShare

It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization. In the same week, we got two (awesome) papers: Absolute Zero Reasoner: Improvements on code

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat212

shareShare

Alex Dimakis

@alexgdimakis

7 months ago

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat190

shareShare

Ruoyu Sun

@ruoyusun_ui

7 months ago

Neural nets' Hessians are often **nearly block diagonal**! There is little understanding when and why this will happen. We provide one of the first theoretical analysis using random matrix theory. Somewhat unexpectedly, we find that one primary driver is a large # classes C.

thumb_up_off_alt125

chat_bubble_outline5

repeat20

shareShare

Zhengyang Tang

@zhengyang_42

7 months ago

Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! 🎉 This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare

Ge Zhang

@gezhang86038849

7 months ago

[1/n] 🚀 Thrilled to unveil our latest breakthrough: AttentionInfluence! A groundbreaking, training-free, zero-supervision approach for selecting reasoning-rich pretraining data—just by masking attention heads! ✨ No labels. No retraining. A mere pretrained 1.3B-parameter model

thumb_up_off_alt234

chat_bubble_outline8

repeat48

shareShare

Jeremy Bernstein

@jxbz

7 months ago

I was really grateful to have the chance to speak at Cohere Labs and ML Collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)

I was really grateful to have the chance to speak at <a href="/Cohere_Labs/">Cohere Labs</a> and <a href="/ml_collective/">ML Collective</a> last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here...

(1/6)

thumb_up_off_alt560

chat_bubble_outline10

repeat50

shareShare

Google DeepMind

@googledeepmind

7 months ago

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across Google. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline180

repeat1,1K

shareShare

Alexander Novikov

@sashavnovikov

7 months ago

After 1.5 years of work, I'm so excited to announce AlphaEvolve – our new LLM + evolution agent! Learn more in the blog post: deepmind.google/discover/blog/… White paper PDF: storage.googleapis.com/deepmind-media… (1/2)

thumb_up_off_alt2,2K

chat_bubble_outline49

repeat390

shareShare

Dmitry Rybin

@dmitryrybin1

7 months ago

We discovered faster way to compute product of matrix by its transpose! This has profound implications for data analysis, chip design, wireless communication, and LLM training! paper: arxiv.org/abs/2505.09814 The algorithm is based on the following discovery: we can compute

thumb_up_off_alt4,4K

chat_bubble_outline54

repeat587

shareShare

William Merrill

@lambdaviking

6 months ago

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

thumb_up_off_alt275

chat_bubble_outline3

repeat37

shareShare

will brown

@willccbb

6 months ago

i'm teaming up with Kyle Corbitt from openpipe to teach a class about agents + RL :) we'll be teaching the class on Maven 🏛 starting june 16. as far as we know, this is the first course of its kind anywhere to bridge RL + LLM agents, and we’re really excited to share some of our

i'm teaming up with <a href="/corbtt/">Kyle Corbitt</a> from openpipe to teach a class about agents + RL :)

we'll be teaching the class on <a href="/MavenHQ/">Maven 🏛</a> starting june 16. as far as we know, this is the first course of its kind anywhere to bridge RL + LLM agents, and we’re really excited to share some of our

thumb_up_off_alt447

chat_bubble_outline24

repeat33

shareShare

Binyuan Hui

@huybery

6 months ago

Really interesting paper! It looks like our pretraining data mixture somehow led to some surprisingly useful behaviors… This also highlights the importance of code reasoning, since being able to execute code helps prevent hallucinations and is essential for achieving effective

thumb_up_off_alt183

chat_bubble_outline4

repeat19

shareShare

Shenao Zhang

@shenaozhang

6 months ago

🚨Check out our new paper! "Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning" We study 𝙬𝙝𝙮, 𝙝𝙤𝙬, and 𝙬𝙝𝙚𝙣 LLMs should self-reflect and explore at test time—questions that conventional Markovian RL cannot fully answer. A thread:🧵

thumb_up_off_alt171

chat_bubble_outline4

repeat28

shareShare

Ted Zadouri

@tedzadouri

6 months ago

"Pre-training was hard, inference easy; now everything is hard."-Jensen Huang. Inference drives AI progress b/c of test-time compute. Introducing inference aware attn: parallel-friendly, high arithmetic intensity – Grouped-Tied Attn & Grouped Latent Attn

thumb_up_off_alt317

chat_bubble_outline6

repeat46

shareShare

Manish Shetty

@slimshetty_

6 months ago

✨ NEW SWE-Agents BENCHMARK ✨ Introducing GSO: The Global Software Optimization Benchmark - 👩🏻‍💻 100+ challenging software optimization tasks - 🛣️ a long-horizon task w/ precise specification - 🐘 large code changes in Py, C, C++, ... - 📉 SOTA models get < 5% success! 1/

thumb_up_off_alt121

chat_bubble_outline6

repeat26

shareShare

Zixuan Wang

@zzzixuanwang

6 months ago

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

thumb_up_off_alt186

chat_bubble_outline1

repeat34

shareShare

Arthur Douillard

@ar_douillard

6 months ago

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards arxiv.org/abs/2505.24760

thumb_up_off_alt233

chat_bubble_outline6

repeat27

shareShare

Shizhe Diao

@shizhediao

6 months ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

thumb_up_off_alt382

chat_bubble_outline17

repeat64

shareShare

Xinyu Zhu

@tianhongzxy

6 months ago

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]

thumb_up_off_alt401

chat_bubble_outline6

repeat58

shareShare

Yingru Li

good girl

Simon Shaolei Du

Minqi Jiang

Alex Dimakis

Ruoyu Sun

Zhengyang Tang

Ge Zhang

Jeremy Bernstein

Google DeepMind

Alexander Novikov

Dmitry Rybin

William Merrill

will brown

Binyuan Hui

Shenao Zhang

Ted Zadouri

Manish Shetty

Zixuan Wang

Arthur Douillard

Shizhe Diao

Xinyu Zhu