lovish (@louvishh) Twitter Tweets • TwiCopy

lovish

@louvishh

+ Follow

phding @ucl and @aiatmeta (llama team). mostly random tweets here.

ID: 1410496633725276162

linkhttp://lovishmadaan.github.io calendar_today01-07-2021 07:13:13

248 Tweet

830 Followers

795 Following

Nicholas Roberts

@nick11roberts

9 months ago

📉📉NEW SCALING LAW PHENOMENON 📉📉 We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of skills: arxiv.org/pdf/2503.10061 [1/n]

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat173

shareShare

lovish

@louvishh

8 months ago

i love how my feed is filled with Zachary Nado burns every time a new gemini comes out. probably goes back to hibernation to build the best models again after a day.

thumb_up_off_alt25

chat_bubble_outline1

repeat0

shareShare

lovish

@louvishh

8 months ago

llama 4 is here 🦙🦙

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

lovish

@louvishh

8 months ago

reasoning coming soon 😌

thumb_up_off_alt162

chat_bubble_outline6

repeat5

shareShare

lovish

@louvishh

8 months ago

spotted at the zoo today: maverick and behemoth enjoying the rare london sun

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

Dieuwke Hupkes

@_dieuwke_

8 months ago

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) arxiv.org/abs/2504.10356 Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design AI at Meta

thumb_up_off_alt49

chat_bubble_outline3

repeat10

shareShare

Rishabh Agarwal

@agarwl_

2 months ago

Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best

thumb_up_off_alt319

chat_bubble_outline11

repeat27

shareShare

Nathan Lambert

@natolambert

2 months ago

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat193

shareShare

Lewis Tunstall

@_lewtun

2 months ago

This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability &

thumb_up_off_alt274

chat_bubble_outline12

repeat22

shareShare

Ross Taylor

@rosstaylor90

2 months ago

This is a great paper and a real gift to the open community to surface these ablations. Open RL has been on an interesting path of “reinforce-ification” since R1. GRPO was a PPO like method that was motivated by the need to drop the value network and rely on MC estimates (for

thumb_up_off_alt108

chat_bubble_outline1

repeat10

shareShare

lovish

@louvishh

2 months ago

finding compute for this project (and dealing with new hardware) was such a fun exercise in itself lol. can't believe we spent this much on this paper haha. rl scaling ftw 🙌

thumb_up_off_alt41

chat_bubble_outline1

repeat0

shareShare

Deedy

@deedydas

2 months ago

Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat218

shareShare

Devvrit

@devvrit_khatri

2 months ago

Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)

thumb_up_off_alt44

chat_bubble_outline1

repeat4

shareShare