Karthik A Sankararaman 🇮🇳🇺🇸 (@karthikabinav) Twitter Tweets • TwiCopy

Andrew Carr (e/🤸)

a year ago

I often wonder how Meta did such a good job post training the Llama series of models. They just released a paper that gives us a good idea. The big challenge is that using a single reward model to align an LLM on multiple tasks fails due to reward hacking, multi-objective

thumb_up_off_alt379

chat_bubble_outline4

repeat48

shareShare

AI at Meta

@aiatmeta

a year ago

📣 New paper from GenAI and Meta FAIR. CGPO uses Mixture of Judges and consistently outperforms SOTA RLHF approaches across various tasks. More details and key results in the full thread 🧵

thumb_up_off_alt246

chat_bubble_outline9

repeat37

shareShare

Yuandong Tian

@tydsh

a year ago

📢 New papers from GenAI & FAIR: mixture of Judges work really well in RLHF! Please check Han Fang 's thread for more details!

thumb_up_off_alt43

chat_bubble_outline0

repeat6

shareShare

Narendra Modi

@narendramodi

a year ago

Shri Ratan Tata Ji was a visionary business leader, a compassionate soul and an extraordinary human being. He provided stable leadership to one of India’s oldest and most prestigious business houses. At the same time, his contribution went far beyond the boardroom. He endeared

thumb_up_off_alt381,381K

chat_bubble_outline8,8K

repeat50,50K

shareShare

Han Fang

@han_fang_

a year ago

Excited to share a new benchmark from our team- this eval set, *Multi-IF*, enables benchmarking LLMs for Multi-Turn and Multilingual Instructions Following, which is something that the LLM community been waiting for. Link to paper: arxiv.org/abs/2410.15553

thumb_up_off_alt90

chat_bubble_outline3

repeat26

shareShare

Lilian Weng

@lilianweng

a year ago

🦃 At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Reward hacking occurs when an RL agent exploits flaws in the reward function or env to maximize rewards without learning the intended behavior. This is imo a

thumb_up_off_alt1,1K

chat_bubble_outline67

repeat228

shareShare

Guilherme Penedo

@gui_penedo

a year ago

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages. We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages. 🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

thumb_up_off_alt547

chat_bubble_outline14

repeat115

shareShare

Sagar

@sagarcasm

a year ago

Economic reforms, quiet dignity, and unparalleled intellect. Dr. Manmohan Singh, you were truly one of a kind. Om Shanti #ManmohanSingh

thumb_up_off_alt18,18K

chat_bubble_outline195

repeat1,1K

shareShare

Aidan McLaughlin

@aidan_mclau

a year ago

zhengdongwang.com/2024/12/29/202…

thumb_up_off_alt45

chat_bubble_outline1

repeat2

shareShare

Han Fang

@han_fang_

10 months ago

New paper from my team Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization arxiv.org/abs/2501.17974

thumb_up_off_alt21

chat_bubble_outline1

repeat3

shareShare

Behnam Neyshabur

@bneyshabur

10 months ago

No matter your industry, start experimenting with using foundation models in every aspect of your work and life today! Once you learn how to use them, you’ll be shocked by their impact! Adapt and thrive—evolution won’t wait.

thumb_up_off_alt88

chat_bubble_outline3

repeat5

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

8 months ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

thumb_up_off_alt5,5K

chat_bubble_outline323

repeat959

shareShare

AI at Meta

@aiatmeta

8 months ago

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

thumb_up_off_alt13,13K

chat_bubble_outline706

repeat2,2K

shareShare

Ahmad Al-Dahle

@ahmad_al_dahle

8 months ago

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were

thumb_up_off_alt1,1K

chat_bubble_outline86

repeat84

shareShare

Percy Liang

@percyliang

8 months ago

We ran Llama 4 Maverick through some HELM benchmarks. It is 1st on HELM capabilities (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH), but… crfm.stanford.edu/helm/capabilit…

thumb_up_off_alt142

chat_bubble_outline6

repeat17

shareShare

ICLR 2025

@iclr_conf

7 months ago

Announcing the Outstanding Paper Awards at ICLR 2025! Congratulations to all the authors for their contributions! blog.iclr.cc/2025/04/22/ann…

thumb_up_off_alt271

chat_bubble_outline1

repeat29

shareShare

kalomaze

@kalomaze

7 months ago

VR-CLI is an obscenely powerful RL objective that was mentioned in a paper that wasn't hyped to 1/10th of the degree it deserved. "oh, you can optimize the reasoning traces for next-token prediction in a way that generalizes WAY better..." ...casual bombshell implications.

thumb_up_off_alt451

chat_bubble_outline21

repeat41

shareShare

fly51fly

@fly51fly

6 months ago

[LG] Reinforcement Learning from User Feedback E Han, J Chen, K A Sankararaman, X Peng... [Meta GenAI] (2025) arxiv.org/abs/2505.14946

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Lance Fortnow

@fortnow

6 months ago

The 2025 Gödel Prize is given to Eshan Chattopadhyay and David Zuckerman, “Explicit two-source extractors and resilient functions”. Paper: doi.org/10.4007/annals… Favorite Theorems Blog Post: blog.computationalcomplexity.org/2024/07/favori…

thumb_up_off_alt27

chat_bubble_outline0

repeat8

shareShare

Aravind Srinivas

@aravsrinivas

3 months ago

Happy Independence Day to all Indians! 🇮🇳

thumb_up_off_alt10,10K

chat_bubble_outline131

repeat437

shareShare