Karthik A Sankararaman 🇮🇳🇺🇸 (@karthikabinav) 's Twitter Profile
Karthik A Sankararaman 🇮🇳🇺🇸

@karthikabinav

Research/Engineering in #Algorithms, #machinelearning, #generativeAI; Long-term Affiliations: #iitm, @UMDCS, @facebook, @meta

ID: 85788359

linkhttp://karthikabinavs.xyz calendar_today28-10-2009 10:34:40

1,1K Tweet

1,1K Takipçi

2,2K Takip Edilen

Andrew Carr (e/🤸) (@andrew_n_carr) 's Twitter Profile Photo

I often wonder how Meta did such a good job post training the Llama series of models. They just released a paper that gives us a good idea. The big challenge is that using a single reward model to align an LLM on multiple tasks fails due to reward hacking, multi-objective

I often wonder how Meta did such a good job post training the Llama series of models. 

They just released a paper that gives us a good idea. 

The big challenge is that using a single reward model to align an LLM on multiple tasks fails due to reward hacking, multi-objective
AI at Meta (@aiatmeta) 's Twitter Profile Photo

📣 New paper from GenAI and Meta FAIR. CGPO uses Mixture of Judges and consistently outperforms SOTA RLHF approaches across various tasks. More details and key results in the full thread 🧵

Narendra Modi (@narendramodi) 's Twitter Profile Photo

Shri Ratan Tata Ji was a visionary business leader, a compassionate soul and an extraordinary human being. He provided stable leadership to one of India’s oldest and most prestigious business houses. At the same time, his contribution went far beyond the boardroom. He endeared

Shri Ratan Tata Ji was a visionary business leader, a compassionate soul and an extraordinary human being. He provided stable leadership to one of India’s oldest and most prestigious business houses. At the same time, his contribution went far beyond the boardroom. He endeared
Han Fang (@han_fang_) 's Twitter Profile Photo

Excited to share a new benchmark from our team- this eval set, *Multi-IF*, enables benchmarking LLMs for Multi-Turn and Multilingual Instructions Following, which is something that the LLM community been waiting for. Link to paper: arxiv.org/abs/2410.15553

Lilian Weng (@lilianweng) 's Twitter Profile Photo

🦃 At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Reward hacking occurs when an RL agent exploits flaws in the reward function or env to maximize rewards without learning the intended behavior. This is imo a

Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages. We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages. 🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.
Sagar (@sagarcasm) 's Twitter Profile Photo

Economic reforms, quiet dignity, and unparalleled intellect. Dr. Manmohan Singh, you were truly one of a kind. Om Shanti #ManmohanSingh

Economic reforms, quiet dignity, and unparalleled intellect. Dr. Manmohan Singh, you were truly one of a kind.  Om Shanti #ManmohanSingh
Han Fang (@han_fang_) 's Twitter Profile Photo

New paper from my team Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization arxiv.org/abs/2501.17974

Behnam Neyshabur (@bneyshabur) 's Twitter Profile Photo

No matter your industry, start experimenting with using foundation models in every aspect of your work and life today! Once you learn how to use them, you’ll be shocked by their impact! Adapt and thrive—evolution won’t wait.

Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were

Percy Liang (@percyliang) 's Twitter Profile Photo

We ran Llama 4 Maverick through some HELM benchmarks. It is 1st on HELM capabilities (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH), but… crfm.stanford.edu/helm/capabilit…

We ran Llama 4 Maverick through some HELM benchmarks. It is 1st on HELM capabilities (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH), but…
crfm.stanford.edu/helm/capabilit…
ICLR 2025 (@iclr_conf) 's Twitter Profile Photo

Announcing the Outstanding Paper Awards at ICLR 2025! Congratulations to all the authors for their contributions! blog.iclr.cc/2025/04/22/ann…

kalomaze (@kalomaze) 's Twitter Profile Photo

VR-CLI is an obscenely powerful RL objective that was mentioned in a paper that wasn't hyped to 1/10th of the degree it deserved. "oh, you can optimize the reasoning traces for next-token prediction in a way that generalizes WAY better..." ...casual bombshell implications.

VR-CLI is an obscenely powerful RL objective that was mentioned in a paper that wasn't hyped to 1/10th of the degree it deserved.
"oh, you can optimize the reasoning traces for next-token prediction in a way that generalizes WAY better..."
...casual bombshell implications.
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Reinforcement Learning from User Feedback E Han, J Chen, K A Sankararaman, X Peng... [Meta GenAI] (2025) arxiv.org/abs/2505.14946

[LG] Reinforcement Learning from User Feedback
E Han, J Chen, K A Sankararaman, X Peng... [Meta GenAI] (2025)
arxiv.org/abs/2505.14946
Lance Fortnow (@fortnow) 's Twitter Profile Photo

The 2025 Gödel Prize is given to Eshan Chattopadhyay and David Zuckerman, “Explicit two-source extractors and resilient functions”. Paper: doi.org/10.4007/annals… Favorite Theorems Blog Post: blog.computationalcomplexity.org/2024/07/favori…