Matthew Yang (@matthewyryang) Twitter Tweets • TwiCopy

Yuxiao Qu

8 months ago

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

thumb_up_off_alt309

chat_bubble_outline6

repeat62

shareShare

Amrith Setlur

@setlur_amrith

8 months ago

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️

thumb_up_off_alt14

chat_bubble_outline2

repeat5

shareShare

Aviral Kumar

@aviral_kumar2

8 months ago

A lot of work focuses on test-time scaling. But we aren't scaling it optimally, simply training a long CoT doesn't mean we use it well. My students developed "v0" of a paradigm to do this optimally by running RL with dense rewards = minimizing regret over long CoT episodes. 🧵⬇️

thumb_up_off_alt200

chat_bubble_outline3

repeat33

shareShare

Po-Shen Loh

@poshenloh

8 months ago

Oh my goodness. GPT-o1 got a perfect score on my Carnegie Mellon University undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to

Oh my goodness. GPT-o1 got a perfect score on my <a href="/CarnegieMellon/">Carnegie Mellon University</a> undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to

thumb_up_off_alt2,2K

chat_bubble_outline95

repeat368

shareShare

Alex Patrascu

@maxescu

7 months ago

Be water, my friend

thumb_up_off_alt2,2K

chat_bubble_outline78

repeat275

shareShare

vitrupo

@vitrupo

7 months ago

"We weren’t born to do jobs." Bill Gates says jobs are a relic of human scarcity. In a world without shortages, society will be able to produce enough—food, healthcare, services—without everyone working. The real shift won’t be economic. It’ll be reprogramming how we think

thumb_up_off_alt2,2K

chat_bubble_outline257

repeat367

shareShare

Quentin Gallouédec

@qgallouedec

7 months ago

🤔 How do you explain that when we apply RL to math problems, the incorrect answers become longer than the correct ones? We had this discussion this morning, and I'm curious to know what the community thinks about it.

thumb_up_off_alt186

chat_bubble_outline38

repeat20

shareShare

Amrith Setlur

@setlur_amrith

5 months ago

Introducing e3 🔥 Best <2B model on math 💪 Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️ 🚨 arxiv.org/abs/2506.09026 🚨 matthewyryang.github.io/e3/

thumb_up_off_alt86

chat_bubble_outline1

repeat20

shareShare

Aviral Kumar

@aviral_kumar2

5 months ago

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. Amrith Setlur & Matthew Yang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems.

<a href="/setlur_amrith/">Amrith Setlur</a> & <a href="/matthewyryang/">Matthew Yang</a>'s new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️

thumb_up_off_alt181

chat_bubble_outline2

repeat27

shareShare

Amrith Setlur

@setlur_amrith

5 months ago

Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis

thumb_up_off_alt147

chat_bubble_outline2

repeat25

shareShare

Alexandr Wang

@alexandr_wang

4 months ago

I’m excited to be the Chief AI Officer of Meta, working alongside Nat Friedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀

I’m excited to be the Chief AI Officer of <a href="/Meta/">Meta</a>, working alongside <a href="/natfriedman/">Nat Friedman</a>, and thrilled to be accompanied by an incredible group of people joining on the same day.

Towards superintelligence 🚀

thumb_up_off_alt22,22K

chat_bubble_outline1,1K

repeat1,1K

shareShare

Wan

@alibaba_wan

3 months ago

🚀 Introducing Wan2.2: The World's First Open-Source MoE-Architecture Video Generation Model with Cinematic Control! 🔥 Key Innovations: ꔷ World's First Open-Source MoE Video Model: Our Mixture-of-Experts architecture scales model capacity without increasing computational

thumb_up_off_alt1,1K

chat_bubble_outline71

repeat305

shareShare