Satwik Bhattamishra (@satwik1729) 's Twitter Profile
Satwik Bhattamishra

@satwik1729

CS PhD student at Oxford, SR at Google | Ex - Research fellow at Microsoft Research India, Undergrad at BITS Pilani

ID: 1204850991645876225

linkhttps://satwikb.com/ calendar_today11-12-2019 19:50:45

197 Tweet

649 Followers

743 Following

Prateek Yadav (@prateeky2806) 's Twitter Profile Photo

I'm on the job market! Please reach out if you are looking to hire someone to work on - RLHF - Efficiency - MoE/Modular models - Synthetic Data - Test time compute - other phases of pre/post-training. If you are not hiring then I would appreciate a retweet! More details๐Ÿ‘‡

Nagarajan Natarajan (@naga86) 's Twitter Profile Photo

Microsoft Research India is excited to announce applications are open for our Research Fellow program (deadline 15th Feb 2025). Details of the program and the application are here: ๐Ÿ”— Research Fellow program: aka.ms/msrirf Microsoft Research

Richard Futrell (@rljfutrell) 's Twitter Profile Photo

Language Models learn a lot about language, much more than we expected, without much built-in structure. This matters for linguistics and opens up enormous opportunities. So should we just throw out linguistics? No! Quite the opposite: we need theory and structure.

Language Models learn a lot about language, much more than we expected, without much built-in structure. This matters for linguistics and opens up enormous opportunities. So should we just throw out linguistics? No! Quite the opposite: we need theory and structure.
Arkil Patel (@arkil_patel) 's Twitter Profile Photo

Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ Work w/ fantastic advisors ๐Ÿ‡บ๐Ÿ‡ฆ Dzmitry Bahdanau and Siva Reddy Thread ๐Ÿงต:

Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ

Work w/ fantastic advisors <a href="/DBahdanau/">๐Ÿ‡บ๐Ÿ‡ฆ Dzmitry Bahdanau</a> and <a href="/sivareddyg/">Siva Reddy</a>

Thread ๐Ÿงต:
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Super long-context models with context window spanning millions of tokens are becoming commonplace (Google DeepMind Gemini, xAI Grok 3, Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes

Super long-context models with context window spanning millions of tokens are becoming commonplace (<a href="/GoogleDeepMind/">Google DeepMind</a> Gemini, <a href="/xai/">xAI</a> Grok 3, <a href="/Alibaba_Qwen/">Qwen</a> Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Super excited to share another incredible systems that we have built over the past two years! Training giant foundation models (like Llama-3 405B) costs a FORTUNE ๐Ÿ’ฐ (millions of dollars)! Optimizing the training "recipe" (parallelism, memory tricks, etc.) is critical but

Charlie London (@charlielondon02) 's Twitter Profile Photo

I did suggest leaving the bracketed part out of the title, but this is Yoonsoo's baby, and it's also very good, so highly recommend reading!

Arkil Patel (@arkil_patel) 's Twitter Profile Photo

๐“๐ก๐จ๐ฎ๐ ๐ก๐ญ๐จ๐ฅ๐จ๐ ๐ฒ paper is out! ๐Ÿ”ฅ๐Ÿ‹ We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena! Incredible effort by the entire team! ๐ŸŒ: mcgill-nlp.github.io/thoughtology/

๐“๐ก๐จ๐ฎ๐ ๐ก๐ญ๐จ๐ฅ๐จ๐ ๐ฒ paper is out! ๐Ÿ”ฅ๐Ÿ‹

We study the reasoning chains of DeepSeek-R1 across a variety of tasks and settings and find several surprising and interesting phenomena!

Incredible effort by the entire team!

๐ŸŒ: mcgill-nlp.github.io/thoughtology/
abakalova (@abakalova13175) 's Twitter Profile Photo

How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. ๐Ÿงต

How do LLMs perform in-context learning?

LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. ๐Ÿงต
Kabir (@kabirahuja004) 's Twitter Profile Photo

๐Ÿ“ข New Paper! Tired ๐Ÿ˜ด of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a storyโ€™s world ๐ŸŒŽ W/ Melanie Sclar, and tsvetshop 1/n

๐Ÿ“ข New Paper!

Tired ๐Ÿ˜ด of reasoning benchmarks full of math &amp; code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a storyโ€™s world ๐ŸŒŽ

W/ <a href="/melaniesclar/">Melanie Sclar</a>, and <a href="/tsvetshop/">tsvetshop</a>

1/n
William Merrill (@lambdaviking) 's Twitter Profile Photo

Padding a transformerโ€™s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? ๐Ÿ‘€ New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding ๐Ÿงต

Padding a transformerโ€™s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? ๐Ÿ‘€

New work with <a href="/Ashish_S_AI/">Ashish Sabharwal</a> addresses this with *exact characterizations* of the expressive power of transformers with padding ๐Ÿงต
Kulin Shah (@shahkulin98) 's Twitter Profile Photo

Excited about this work, where we show that a simple algorithm of inverting candidate samples boosts the performance for reward guidance in diffusion, both in experiments and in theory! Check out the thread by Aayush for more details.

David Chiang (@davidweichiang) 's Twitter Profile Photo

New on arXiv: Knee-Deep in C-RASP, by Andy J Yang, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

New on arXiv: Knee-Deep in C-RASP, by <a href="/pentagonalize/">Andy J Yang</a>, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
Kulin Shah (@shahkulin98) 's Twitter Profile Photo

Thrilled to share that our work received the Outstanding Paper Award at ICML! I will be giving the oral presentation on Tuesday at 4:15 PM. Jaeyeon (Jay) Kim @ICML and I both will be at the poster session shortly after the oral presentation. Please attend if possible!

Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

The bitter lesson of AI infra: The hardest part about building faster LLM inference systems is not designing the systems, but rather it is evaluating if the system is actually faster! ๐Ÿค” This graph from a recent top systems venue paper about long-context serving shows average

The bitter lesson of AI infra: The hardest part about building faster LLM inference systems is not designing the systems, but rather it is evaluating if the system is actually faster! ๐Ÿค”

This graph from a recent top systems venue paper about long-context serving shows average