Anirudh Buvanesh (@anirudhbuvanesh) Twitter Tweets • TwiCopy

Anirudh Buvanesh

@anirudhbuvanesh

+ Follow

ID: 1580609992784113665

calendar_today13-10-2022 17:23:16

4 Tweet

5 Followers

127 Following

Laurent Charlin

@lcharlin

a year ago

Introducing a framework for end-to-end discovery of data structures—no predefined algorithms or hand-tuning needed. Work led by Omar Salemohamed. More details below. arxiv.org/abs/2411.03253

thumb_up_off_alt18

chat_bubble_outline1

repeat8

shareShare

Thrilled to share our new work EARL 🚀 1⃣ An AR + RL image editing model that outperforms diffusion baselines w/ 5x less data. 2⃣ First systematic SFT vs RL study in image editing → RL post-training shines on complex edits where paired data is scarce. See thread for details👇

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Milad Aghajohari

@maghajohari

7 months ago

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

thumb_up_off_alt919

chat_bubble_outline14

repeat200

shareShare

Johan S. Obando 👍🏽

@johanobandoc

7 months ago

1/3 🥳Excited to share our new paper ‘Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents’! Project your features onto a product of simplices → sparse, stable reps, stronger grads, faster learning. 🧵For more details, check out Pablo’s thread 👇

thumb_up_off_alt43

chat_bubble_outline2

repeat15

shareShare

Jatin Prakash

@bicycleman15

6 months ago

New paper alert 🚨 What if I told you there is an architecture that provides a _knob_ to control quality-efficiency trade-offs directly at test-time? Introducing Compress & Attend Transformers (CATs) that provide you exactly this! 🧵(1/n) 👇

thumb_up_off_alt24

chat_bubble_outline1

repeat11

shareShare

Muqeeth

@muqeeth10

5 months ago

New preprint! Learning Robust Social Strategies with Large Language Models. We apply multi-agent RL finetuning to train LLMs that achieve cooperative and non-exploitable behavior in social dilemmas for the first time. 📄 arxiv.org/abs/2511.19405 🧵 ⬇️ (1/8)

thumb_up_off_alt21

chat_bubble_outline1

repeat14

shareShare

Anirudh Buvanesh

Laurent Charlin

Ankur Sikarwar

Milad Aghajohari

Johan S. Obando 👍🏽

Jatin Prakash

Muqeeth