Yingru Li (@richardyrli) 's Twitter Profile
Yingru Li

@richardyrli

AI, RL, LLMs, Data Science | PhD@CUHK | ex-intern @MSFTResearch @TencentGlobal | On Job Market

ID: 2152232932

linkhttps://richardli.xyz calendar_today25-10-2013 07:28:08

549 Tweet

414 Followers

1,1K Following

Minqi Jiang (@minqijiang) 's Twitter Profile Photo

It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization. In the same week, we got two (awesome) papers: Absolute Zero Reasoner: Improvements on code

It's so fun to see RL finally work on complex real-world tasks with LLM policies, but it's increasingly clear that we lack an understanding of how RL fine-tuning leads to generalization.

In the same week, we got two (awesome) papers:

Absolute Zero Reasoner: Improvements on code
Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and

"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. 

In the "One Training example" paper 
the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and
Ruoyu Sun (@ruoyusun_ui) 's Twitter Profile Photo

Neural nets' Hessians are often **nearly block diagonal**! There is little understanding when and why this will happen. We provide one of the first theoretical analysis using random matrix theory. Somewhat unexpectedly, we find that one primary driver is a large # classes C.

Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! šŸŽ‰ This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling

Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! šŸŽ‰

This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling
Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

[1/n] šŸš€ Thrilled to unveil our latest breakthrough: AttentionInfluence! A groundbreaking, training-free, zero-supervision approach for selecting reasoning-rich pretraining data—just by masking attention heads! ✨ No labels. No retraining. A mere pretrained 1.3B-parameter model

Jeremy Bernstein (@jxbz) 's Twitter Profile Photo

I was really grateful to have the chance to speak at Cohere Labs and ML Collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)

I was really grateful to have the chance to speak at <a href="/Cohere_Labs/">Cohere Labs</a> and <a href="/ml_collective/">ML Collective</a> last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here...

(1/6)
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: šŸ”˜ Design faster matrix multiplication algorithms šŸ”˜ Find new solutions to open math problems šŸ”˜ Make data centers, chip design and AI training more efficient across Google. 🧵

Alexander Novikov (@sashavnovikov) 's Twitter Profile Photo

After 1.5 years of work, I'm so excited to announce AlphaEvolve – our new LLM + evolution agent! Learn more in the blog post: deepmind.google/discover/blog/… White paper PDF: storage.googleapis.com/deepmind-media… (1/2)

After 1.5 years of work, I'm so excited to announce AlphaEvolve – our new LLM + evolution agent!
Learn more in the blog post: deepmind.google/discover/blog/…
White paper PDF: storage.googleapis.com/deepmind-media…
(1/2)
Dmitry Rybin (@dmitryrybin1) 's Twitter Profile Photo

We discovered faster way to compute product of matrix by its transpose! This has profound implications for data analysis, chip design, wireless communication, and LLM training! paper: arxiv.org/abs/2505.09814 The algorithm is based on the following discovery: we can compute

We discovered faster way to compute product of matrix by its transpose!

This has profound implications for data analysis, chip design, wireless communication, and LLM training!

paper: arxiv.org/abs/2505.09814

The algorithm is based on the following discovery: we can compute
William Merrill (@lambdaviking) 's Twitter Profile Photo

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? šŸ‘€ New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? šŸ‘€

New work with <a href="/Ashish_S_AI/">Ashish Sabharwal</a> addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵
will brown (@willccbb) 's Twitter Profile Photo

i'm teaming up with Kyle Corbitt from openpipe to teach a class about agents + RL :) we'll be teaching the class on Maven šŸ› starting june 16. as far as we know, this is the first course of its kind anywhere to bridge RL + LLM agents, and we’re really excited to share some of our

i'm teaming up with <a href="/corbtt/">Kyle Corbitt</a> from openpipe to teach a class about agents + RL :)

we'll be teaching the class on <a href="/MavenHQ/">Maven šŸ›</a> starting june 16. as far as we know, this is the first course of its kind anywhere to bridge RL + LLM agents, and we’re really excited to share some of our
Binyuan Hui (@huybery) 's Twitter Profile Photo

Really interesting paper! It looks like our pretraining data mixture somehow led to some surprisingly useful behaviors… This also highlights the importance of code reasoning, since being able to execute code helps prevent hallucinations and is essential for achieving effective

Shenao Zhang (@shenaozhang) 's Twitter Profile Photo

🚨Check out our new paper! "Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning" We study š™¬š™š™®, š™š™¤š™¬, and š™¬š™š™šš™£ LLMs should self-reflect and explore at test time—questions that conventional Markovian RL cannot fully answer. A thread:🧵

🚨Check out our new paper!

"Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning"

We study š™¬š™š™®, š™š™¤š™¬, and š™¬š™š™šš™£ LLMs should self-reflect and explore at test time—questions that conventional Markovian RL cannot fully answer. A thread:🧵
Ted Zadouri (@tedzadouri) 's Twitter Profile Photo

"Pre-training was hard, inference easy; now everything is hard."-Jensen Huang. Inference drives AI progress b/c of test-time compute. Introducing inference aware attn: parallel-friendly, high arithmetic intensity – Grouped-Tied Attn & Grouped Latent Attn

"Pre-training was hard, inference easy; now everything is hard."-Jensen Huang. Inference drives AI progress b/c of test-time compute.

Introducing inference aware attn: parallel-friendly, high arithmetic intensity – Grouped-Tied Attn &amp; Grouped Latent Attn
Manish Shetty (@slimshetty_) 's Twitter Profile Photo

✨ NEW SWE-Agents BENCHMARK ✨ Introducing GSO: The Global Software Optimization Benchmark - šŸ‘©šŸ»ā€šŸ’» 100+ challenging software optimization tasks - šŸ›£ļø a long-horizon task w/ precise specification - 🐘 large code changes in Py, C, C++, ... - šŸ“‰ SOTA models get < 5% success! 1/

✨ NEW SWE-Agents BENCHMARK ✨

Introducing GSO: The Global Software Optimization Benchmark
 - šŸ‘©šŸ»ā€šŸ’» 100+ challenging software optimization tasks
 - šŸ›£ļø a long-horizon task w/ precise specification
 - 🐘 large code changes in Py, C, C++, ...
 - šŸ“‰ SOTA models get &lt; 5% success!

1/
Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Shizhe Diao (@shizhediao) 's Twitter Profile Photo

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL šŸ˜Ž, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning modelšŸ’„and offering

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL šŸ˜Ž, a novel training recipe that scales RL to &gt;2k steps, empowering the world’s leading 1.5B reasoning modelšŸ’„and offering
Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

šŸ”„The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?šŸ¤” šŸš€Introducing our new paperšŸ‘‡ šŸ’”TL;DR: Just penalizing incorrect rolloutsāŒ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]

šŸ”„The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?šŸ¤”
šŸš€Introducing our new paperšŸ‘‡
šŸ’”TL;DR: Just penalizing incorrect rolloutsāŒ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO!

🧵[1/n]