Michael Hahn (@mhahn29) 's Twitter Profile
Michael Hahn

@mhahn29

Professor at Saarland University
@LstSaar @SIC_Saar. Previously PhD at Stanford @stanfordnlp. Machine learning, language, and cognitive science.

ID: 609965358

linkhttps://lacoco-lab.github.io/home/ calendar_today16-06-2012 10:35:33

174 Tweet

975 Followers

763 Following

Julien Siems (@julien_siems) 's Twitter Profile Photo

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
William Merrill (@lambdaviking) 's Twitter Profile Photo

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀

New work with <a href="/Ashish_S_AI/">Ashish Sabharwal</a> addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵
Sasha Boguraev (@sashaboguraev) 's Twitter Profile Photo

A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models. New work with Kyle Mahowald and Christopher Potts! 🧵👇

A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.

New work with <a href="/kmahowald/">Kyle Mahowald</a> and <a href="/ChrisGPotts/">Christopher Potts</a>!

🧵👇
Charlie London (@charlielondon02) 's Twitter Profile Photo

New preprint with my supervisor, Varun! We show that padding the input of a Transformer with blank "pause" tokens strictly increases expressivity (in the finite-precision case), enabling it to compute everything in AC0.

New preprint with my supervisor, Varun! We show that padding the input of a Transformer with blank "pause" tokens strictly increases expressivity (in the finite-precision case), enabling it to compute everything in AC0.
Aryaman Arora (@aryaman2020) 's Twitter Profile Photo

new paper! 🫡 why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!

new paper! 🫡

why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!
Manuel Gomez-Rodriguez (@autreche) 's Twitter Profile Photo

Is your LLM overcharging you?! In our new paper arxiv.org/abs/2505.21627, we show that pay-per-token creates an incentive for LLM providers to misreport the (number of) tokens an LLM used to generate an output, and users cannot know whether a provider is overcharging them (1/n)

Michael Hanna (@michaelwhanna) 's Twitter Profile Photo

Mateusz and I are excited to announce circuit-tracer, a library that makes circuit-finding simple! Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on neuronpedia: shorturl.at/SUX2A

<a href="/mntssys/">Mateusz</a> and I are excited to announce circuit-tracer, a library that makes circuit-finding simple!

Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on <a href="/neuronpedia/">neuronpedia</a>: shorturl.at/SUX2A
Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Aaditya Singh (@aaditya6284) 's Twitter Profile Photo

Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread 🧵 with some extra highlights

Yuekun Yao (@yuekun_yao) 's Twitter Profile Photo

Can language models learn implicit reasoning without chain-of-thought? Our new paper shows: Yes, LMs can learn k-hop reasoning; however, it comes at the cost of an exponential increase in training data and linear growth in model depth as k increases. arxiv.org/pdf/2505.17923

Can language models learn implicit reasoning without chain-of-thought? 

Our new paper shows: Yes, LMs can learn k-hop reasoning; however, it comes at the cost of an exponential increase in training data and linear growth in model depth as k increases.

arxiv.org/pdf/2505.17923
Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

Taiga Someya (@agiats_football) 's Twitter Profile Photo

📝 Our #ACL2025 paper is now on arXiv! "Information Locality as an Inductive Bias for Neural Language Models" We quantify how local predictability of a language affects the learnability by neural LMs using our metric, m-local entropy. paper: arxiv.org/abs/2506.05136

William Merrill (@lambdaviking) 's Twitter Profile Photo

A fun project with really thorough analysis of how LLMs try and often fail to implement parsing algorithms. Bonus: find out what this all has to do with the Kalamang language from New Guinea

Mark Rofin (@broccolitwit) 's Twitter Profile Photo

In Transformer theory research, we often use tiny models and toy tasks. A straightforward criticism is that this setting is far from the giant real-world LLMs. Does this mean that the theoretical insights don’t transfer to them? Check out the new cool work investigating that! 👇

Morris Yau (@morrisyau) 's Twitter Profile Photo

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound).
Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality.  We introduce Transformer-PSM with constant time per token decode.  🧐   arxiv.org/pdf/2506.10918
Geoffrey Irving (@geoffreyirving) 's Twitter Profile Photo

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
Tal Linzen (@tallinzen) 's Twitter Profile Photo

I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering.