Laetitia Teodorescu (@lae_teo) 's Twitter Profile
Laetitia Teodorescu

@lae_teo

Hitting LLMs with a stick at @AdaptiveML

ID: 1288890363160203270

calendar_today30-07-2020 17:33:51

103 Tweet

153 Takipçi

610 Takip Edilen

Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces.

So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.

Megakernels are faster & more humane. Here’s how to treat your Llamas ethically:

(Joint
Kyle Corbitt (@corbtt) 's Twitter Profile Photo

New paper! We used GRPO to train Qwen 2.5 on 32 randomly-generated Coq programs that don't compile, and it learned to prove the Riemann Hypothesis.

Laetitia Teodorescu (@lae_teo) 's Twitter Profile Photo

Yeah I mean imagine starting from a random policy "RL is just eliciting latent behaviors present in the uniform distribution over tokens"

𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

GDM releases Formal Conjectures Lean 4 repo of formalized open math problems—no proofs. - built to benchmark auto-theorem proving, clarify conjectures, and grow mathlib ↓ github.com/google-deepmin…

Laetitia Teodorescu (@lae_teo) 's Twitter Profile Photo

Reminds me of this bad boy: arxiv.org/abs/2503.14286 In it they do find that pushing the effective proportion of neg samples up does improve performance

Reminds me of this bad boy: arxiv.org/abs/2503.14286
In it they do find that pushing the effective proportion of neg samples up does improve performance
Shenzhi Wang🌟 (@shenzhiwang_thu) 's Twitter Profile Photo

🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance 🔗arxiv.org/abs/2506.01939 🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25) 🚀Insights: 1. RL retains base model entropy patterns 2. High-entropy tokens drive all RL improvement ⬇️

🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance
🔗arxiv.org/abs/2506.01939

🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25)
🚀Insights: 
1. RL retains base model entropy patterns
2. High-entropy tokens drive all RL improvement
⬇️
Reece Keller (@rdkeller) 's Twitter Profile Photo

1/ I'm excited to share recent results from my first collaboration with the amazing Aran Nayebi and Leo Kozachkov! We show how autonomous behavior and whole-brain dynamics emerge in embodied agents with intrinsic motivation driven by world models.

Minqi Jiang (@minqijiang) 's Twitter Profile Photo

I find it amusing these posts were published a day apart. The apparent contradiction pivots around how easily the task context can be factorized. Devin finds coding tasks tend to be less factorizable than research tasks, and therefore advocates against the multi-agent approach

I find it amusing these posts were published a day apart.

The apparent contradiction pivots around how easily the task context can be factorized. Devin finds coding tasks tend to be less factorizable than research tasks, and therefore advocates against the multi-agent approach
Arvind Narayanan (@random_walker) 's Twitter Profile Photo

There are two competing narratives about AI: (1) there's too much hype (2) society is being too dismissive and complacent about AI progress. I think both have a kernel of truth. In fact, they feed off of each other. The key to the paradox is to recognize that going from AI

erwan plantec (@eplantec) 's Twitter Profile Photo

Synthetic ecosystems that autonomously and continuously evolve in silico 👾 An Alifer dream we pursue with Flow-Lenia ! If you are interested in complex systems with (1) emergent creatures and (2) intrinsic evolutionary dynamics, go check our new paper ! A 🧵

Laetitia Teodorescu (@lae_teo) 's Twitter Profile Photo

Enjoyed this post on open ended agents We need: - cheap open-ended environments - good methods for autocurriculum - effective memory and online learning (from skill libraries to weight updates) Sort of wonder why there aren't so many Voyager followups

Natasha Jaques (@natashajaques) 's Twitter Profile Photo

In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum games (like poker) significantly improves performance on math and reasoning benchmarks, zero-shot. Whaaat? How does this work? We analyze the results and find

Laetitia Teodorescu (@lae_teo) 's Twitter Profile Photo

Making sure your tensors are scaled after the residual stream (along others) seems to help with stability and quantized training