Laetitia Teodorescu (@lae_teo) Twitter Tweets • TwiCopy

orph

@orphcorp

a year ago

jhourney retreat solves this x.com/hermittoday/st…

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Benjamin F Spector

@bfspector

6 months ago

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

thumb_up_off_alt863

chat_bubble_outline32

repeat142

shareShare

Kyle Corbitt

@corbtt

6 months ago

New paper! We used GRPO to train Qwen 2.5 on 32 randomly-generated Coq programs that don't compile, and it learned to prove the Riemann Hypothesis.

thumb_up_off_alt305

chat_bubble_outline15

repeat19

shareShare

Laetitia Teodorescu

@lae_teo

6 months ago

Still not understanding how this can work

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Laetitia Teodorescu

@lae_teo

6 months ago

Yeah I mean imagine starting from a random policy "RL is just eliciting latent behaviors present in the uniform distribution over tokens"

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

6 months ago

GDM releases Formal Conjectures Lean 4 repo of formalized open math problems—no proofs. - built to benchmark auto-theorem proving, clarify conjectures, and grow mathlib ↓ github.com/google-deepmin…

thumb_up_off_alt49

chat_bubble_outline0

repeat11

shareShare

Laetitia Teodorescu

@lae_teo

6 months ago

This is important, most RL results on qwen need to be reevaluated Seems insane to me that ppl don't run proper baselines

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Laetitia Teodorescu

@lae_teo

6 months ago

Reminds me of this bad boy: arxiv.org/abs/2503.14286 In it they do find that pushing the effective proportion of neg samples up does improve performance

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shenzhi Wang🌟

@shenzhiwang_thu

6 months ago

🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance 🔗arxiv.org/abs/2506.01939 🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25) 🚀Insights: 1. RL retains base model entropy patterns 2. High-entropy tokens drive all RL improvement ⬇️

thumb_up_off_alt280

chat_bubble_outline9

repeat53

shareShare

Laetitia Teodorescu

@lae_teo

6 months ago

Someone needs to use this metric as a non-stationary reward for synthetic data generation

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Reece Keller

@rdkeller

6 months ago

1/ I'm excited to share recent results from my first collaboration with the amazing Aran Nayebi and Leo Kozachkov! We show how autonomous behavior and whole-brain dynamics emerge in embodied agents with intrinsic motivation driven by world models.

thumb_up_off_alt354

chat_bubble_outline10

repeat53

shareShare

Minqi Jiang

@minqijiang

6 months ago

I find it amusing these posts were published a day apart. The apparent contradiction pivots around how easily the task context can be factorized. Devin finds coding tasks tend to be less factorizable than research tasks, and therefore advocates against the multi-agent approach

thumb_up_off_alt192

chat_bubble_outline8

repeat15

shareShare

Arvind Narayanan

@random_walker

5 months ago

There are two competing narratives about AI: (1) there's too much hype (2) society is being too dismissive and complacent about AI progress. I think both have a kernel of truth. In fact, they feed off of each other. The key to the paradox is to recognize that going from AI

thumb_up_off_alt237

chat_bubble_outline20

repeat47

shareShare

erwan plantec

@eplantec

5 months ago

Synthetic ecosystems that autonomously and continuously evolve in silico 👾 An Alifer dream we pursue with Flow-Lenia ! If you are interested in complex systems with (1) emergent creatures and (2) intrinsic evolutionary dynamics, go check our new paper ! A 🧵

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat252

shareShare

Laetitia Teodorescu

@lae_teo

5 months ago

Enjoyed this post on open ended agents We need: - cheap open-ended environments - good methods for autocurriculum - effective memory and online learning (from skill libraries to weight updates) Sort of wonder why there aren't so many Voyager followups

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Natasha Jaques

@natashajaques

5 months ago

In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum games (like poker) significantly improves performance on math and reasoning benchmarks, zero-shot. Whaaat? How does this work? We analyze the results and find

thumb_up_off_alt271

chat_bubble_outline6

repeat58

shareShare

Laetitia Teodorescu

@lae_teo

5 months ago

Making sure your tensors are scaled after the residual stream (along others) seems to help with stability and quantized training

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Laetitia Teodorescu

orph

Stella Li

Benjamin F Spector

Kyle Corbitt

Laetitia Teodorescu

Laetitia Teodorescu

𝚐𝔪𝟾𝚡𝚡𝟾

Laetitia Teodorescu

Laetitia Teodorescu

Shenzhi Wang🌟

Laetitia Teodorescu

Reece Keller

Minqi Jiang

Arvind Narayanan

erwan plantec

Laetitia Teodorescu

Natasha Jaques

Laetitia Teodorescu