Michael Hahn (@mhahn29) 's Twitter Profile
Michael Hahn

@mhahn29

Professor at Saarland University
@LstSaar @SIC_Saar. Previously PhD at Stanford @stanfordnlp. Machine learning, language, and cognitive science.

ID: 609965358

linkhttps://lacoco-lab.github.io/home/ calendar_today16-06-2012 10:35:33

174 Tweet

975 Followers

763 Following

Julien Siems (@julien_siems) 's Twitter Profile Photo

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
William Merrill (@lambdaviking) 's Twitter Profile Photo

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀

New work with <a href="/Ashish_S_AI/">Ashish Sabharwal</a> addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵
Aryaman Arora (@aryaman2020) 's Twitter Profile Photo

new paper! 🫡 why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!

new paper! 🫡

why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!
Manuel Gomez-Rodriguez (@autreche) 's Twitter Profile Photo

Is your LLM overcharging you?! In our new paper arxiv.org/abs/2505.21627, we show that pay-per-token creates an incentive for LLM providers to misreport the (number of) tokens an LLM used to generate an output, and users cannot know whether a provider is overcharging them (1/n)

Michael Hanna (@michaelwhanna) 's Twitter Profile Photo

Mateusz and I are excited to announce circuit-tracer, a library that makes circuit-finding simple! Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on neuronpedia: shorturl.at/SUX2A

<a href="/mntssys/">Mateusz</a> and I are excited to announce circuit-tracer, a library that makes circuit-finding simple!

Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on <a href="/neuronpedia/">neuronpedia</a>: shorturl.at/SUX2A
Zixuan Wang (@zzzixuanwang) 's Twitter Profile Photo

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training?

In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient!

arxiv.org/abs/2505.23683

🧵 below (1/10)
Aaditya Singh (@aaditya6284) 's Twitter Profile Photo

Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread 🧵 with some extra highlights

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

Taiga Someya (@agiats_football) 's Twitter Profile Photo

📝 Our #ACL2025 paper is now on arXiv! "Information Locality as an Inductive Bias for Neural Language Models" We quantify how local predictability of a language affects the learnability by neural LMs using our metric, m-local entropy. paper: arxiv.org/abs/2506.05136

William Merrill (@lambdaviking) 's Twitter Profile Photo

A fun project with really thorough analysis of how LLMs try and often fail to implement parsing algorithms. Bonus: find out what this all has to do with the Kalamang language from New Guinea

Mark Rofin (@broccolitwit) 's Twitter Profile Photo

In Transformer theory research, we often use tiny models and toy tasks. A straightforward criticism is that this setting is far from the giant real-world LLMs. Does this mean that the theoretical insights don’t transfer to them? Check out the new cool work investigating that! 👇

Morris Yau (@morrisyau) 's Twitter Profile Photo

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound).
Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality.  We introduce Transformer-PSM with constant time per token decode.  🧐   arxiv.org/pdf/2506.10918
Geoffrey Irving (@geoffreyirving) 's Twitter Profile Photo

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
Tal Linzen (@tallinzen) 's Twitter Profile Photo

I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering.