Satwik Bhattamishra (@satwik1729) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Satwik Bhattamishra

@satwik1729

a year ago

Check out our new work on understanding length generalisation in Transformers! 🧵More details in the thread below.

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

I'm on the job market! Please reach out if you are looking to hire someone to work on - RLHF - Efficiency - MoE/Modular models - Synthetic Data - Test time compute - other phases of pre/post-training. If you are not hiring then I would appreciate a retweet! More details👇

thumb_up_off_alt213

chat_bubble_outline8

repeat60

shareShare

Nagarajan Natarajan

@naga86

a year ago

Microsoft Research India is excited to announce applications are open for our Research Fellow program (deadline 15th Feb 2025). Details of the program and the application are here: 🔗 Research Fellow program: aka.ms/msrirf Microsoft Research

thumb_up_off_alt141

chat_bubble_outline3

repeat48

shareShare

Richard Futrell

@rljfutrell

10 months ago

Language Models learn a lot about language, much more than we expected, without much built-in structure. This matters for linguistics and opens up enormous opportunities. So should we just throw out linguistics? No! Quite the opposite: we need theory and structure.

thumb_up_off_alt216

chat_bubble_outline6

repeat55

shareShare

Arkil Patel

@arkil_patel

9 months ago

Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨ Work w/ fantastic advisors 🇺🇦 Dzmitry Bahdanau and Siva Reddy Thread 🧵:

Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨

Work w/ fantastic advisors <a href="/DBahdanau/">🇺🇦 Dzmitry Bahdanau</a> and <a href="/sivareddyg/">Siva Reddy</a>

Thread 🧵:

thumb_up_off_alt41

chat_bubble_outline1

repeat18

shareShare

Amey Agrawal

@agrawalamey12

8 months ago

Super long-context models with context window spanning millions of tokens are becoming commonplace (Google DeepMind Gemini, xAI Grok 3, Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes

Super long-context models with context window spanning millions of tokens are becoming commonplace (<a href="/GoogleDeepMind/">Google DeepMind</a> Gemini, <a href="/xai/">xAI</a> Grok 3, <a href="/Alibaba_Qwen/">Qwen</a> Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes

thumb_up_off_alt30

chat_bubble_outline1

repeat14

shareShare

Amey Agrawal

@agrawalamey12

8 months ago

Super excited to share another incredible systems that we have built over the past two years! Training giant foundation models (like Llama-3 405B) costs a FORTUNE 💰 (millions of dollars)! Optimizing the training "recipe" (parallelism, memory tricks, etc.) is critical but

thumb_up_off_alt20

chat_bubble_outline1

repeat13

shareShare

Charlie London

@charlielondon02

8 months ago

I did suggest leaving the bracketed part out of the title, but this is Yoonsoo's baby, and it's also very good, so highly recommend reading!

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Arkil Patel

@arkil_patel

8 months ago

𝐓𝐡𝐨𝐮𝐠𝐡𝐭𝐨𝐥𝐨𝐠𝐲 paper is out! 🔥🐋 We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! 🌐: mcgill-nlp.github.io/thoughtology/

thumb_up_off_alt25

chat_bubble_outline0

repeat5

shareShare

abakalova

@abakalova13175

8 months ago

How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵

thumb_up_off_alt28

chat_bubble_outline1

repeat8

shareShare

Kabir

@kabirahuja004

8 months ago

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ Melanie Sclar, and tsvetshop 1/n

thumb_up_off_alt241

chat_bubble_outline3

repeat46

shareShare

Xinting Huang

@huangxt233

7 months ago

I'll be presenting our paper together with Michael Hahn on Saturday morning poster session. Feel free to reach out!

thumb_up_off_alt18

chat_bubble_outline1

repeat7

shareShare

William Merrill

@lambdaviking

6 months ago

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

thumb_up_off_alt275

chat_bubble_outline3

repeat37

shareShare

Satwik Bhattamishra

@satwik1729

6 months ago

Check out this exciting new work led by Charlie London if you're interested in the expressivity of Transformers!

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

Kulin Shah

@shahkulin98

6 months ago

Excited about this work, where we show that a simple algorithm of inverting candidate samples boosts the performance for reward guidance in diffusion, both in experiments and in theory! Check out the thread by Aayush for more details.

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

David Chiang

@davidweichiang

5 months ago

New on arXiv: Knee-Deep in C-RASP, by Andy J Yang, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

New on arXiv: Knee-Deep in C-RASP, by <a href="/pentagonalize/">Andy J Yang</a>, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

thumb_up_off_alt36

chat_bubble_outline1

repeat9

shareShare

Kulin Shah

@shahkulin98

5 months ago

Thrilled to share that our work received the Outstanding Paper Award at ICML! I will be giving the oral presentation on Tuesday at 4:15 PM. Jaeyeon (Jay) Kim @ICML and I both will be at the poster session shortly after the oral presentation. Please attend if possible!

thumb_up_off_alt121

chat_bubble_outline4

repeat14

shareShare

Amey Agrawal

@agrawalamey12

5 months ago

The bitter lesson of AI infra: The hardest part about building faster LLM inference systems is not designing the systems, but rather it is evaluating if the system is actually faster! 🤔 This graph from a recent top systems venue paper about long-context serving shows average

thumb_up_off_alt23

chat_bubble_outline1

repeat10

shareShare

Satwik Bhattamishra

good girl

Satwik Bhattamishra

Prateek Yadav

Nagarajan Natarajan

Richard Futrell

Arkil Patel

Amey Agrawal

Amey Agrawal

Charlie London

Arkil Patel

abakalova

Kabir

Xinting Huang

William Merrill

Satwik Bhattamishra

Kulin Shah

David Chiang

Kulin Shah

Amey Agrawal