lovish (@louvishh) 's Twitter Profile
lovish

@louvishh

phding @ucl and @aiatmeta (llama team). mostly random tweets here.

ID: 1410496633725276162

linkhttp://lovishmadaan.github.io calendar_today01-07-2021 07:13:13

248 Tweet

830 Followers

795 Following

Nicholas Roberts (@nick11roberts) 's Twitter Profile Photo

πŸ“‰πŸ“‰NEW SCALING LAW PHENOMENON πŸ“‰πŸ“‰ We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of skills: arxiv.org/pdf/2503.10061 [1/n]

πŸ“‰πŸ“‰NEW SCALING LAW PHENOMENON πŸ“‰πŸ“‰ 

We find that knowledge and reasoning exhibit different scaling behaviors! 

Super excited to finally tell you all about our paper on the compute optimal scaling of skills: 
arxiv.org/pdf/2503.10061

[1/n]
lovish (@louvishh) 's Twitter Profile Photo

i love how my feed is filled with Zachary Nado burns every time a new gemini comes out. probably goes back to hibernation to build the best models again after a day.

Dieuwke Hupkes (@_dieuwke_) 's Twitter Profile Photo

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) arxiv.org/abs/2504.10356 Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design AI at Meta

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!)

arxiv.org/abs/2504.10356

Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design

<a href="/metaai/">AI at Meta</a>
Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best

Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. 

Coincidentally, this is similar motivation to what we had for the NeurIPS best
Nathan Lambert (@natolambert) 's Twitter Profile Photo

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon.

The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri &amp; Madaan et al.
Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability &

This is the most impressive plot I've seen all year:

- Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute

- PipelineRL crushes conventional RL pipelines in terms of compute efficiency

- Many small details matter for stability &amp;
Ross Taylor (@rosstaylor90) 's Twitter Profile Photo

This is a great paper and a real gift to the open community to surface these ablations. Open RL has been on an interesting path of β€œreinforce-ification” since R1. GRPO was a PPO like method that was motivated by the need to drop the value network and rely on MC estimates (for

lovish (@louvishh) 's Twitter Profile Photo

finding compute for this project (and dealing with new hardware) was such a fun exercise in itself lol. can't believe we spent this much on this paper haha. rl scaling ftw πŸ™Œ

Deedy (@deedydas) 's Twitter Profile Photo

Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.

Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs.

It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws.

Must read for AI nerds.
Devvrit (@devvrit_khatri) 's Twitter Profile Photo

Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)