Michael Hahn (@mhahn29) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

thumb_up_off_alt172

chat_bubble_outline2

repeat32

shareShare

William Merrill

@lambdaviking

a month ago

Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀 New work with Ashish Sabharwal addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵

thumb_up_off_alt275

chat_bubble_outline3

repeat37

shareShare

Aryaman Arora

@aryaman2020

a month ago

new paper! 🫡 why are state space models (SSMs) worse than Transformers at recall over their context? this is a question about the mechanisms underlying model behaviour: therefore, we propose using mechanistic evaluations to answer it!

thumb_up_off_alt641

chat_bubble_outline11

repeat84

shareShare

Manuel Gomez-Rodriguez

@autreche

a month ago

Is your LLM overcharging you?! In our new paper arxiv.org/abs/2505.21627, we show that pay-per-token creates an incentive for LLM providers to misreport the (number of) tokens an LLM used to generate an output, and users cannot know whether a provider is overcharging them (1/n)

thumb_up_off_alt20

chat_bubble_outline1

repeat3

shareShare

Michael Hanna

@michaelwhanna

a month ago

Mateusz and I are excited to announce circuit-tracer, a library that makes circuit-finding simple! Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on neuronpedia: shorturl.at/SUX2A

<a href="/mntssys/">Mateusz</a> and I are excited to announce circuit-tracer, a library that makes circuit-finding simple!

Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on <a href="/neuronpedia/">neuronpedia</a>: shorturl.at/SUX2A

thumb_up_off_alt199

chat_bubble_outline8

repeat45

shareShare

Zixuan Wang

@zzzixuanwang

25 days ago

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

thumb_up_off_alt186

chat_bubble_outline1

repeat34

shareShare

Aaditya Singh

@aaditya6284

20 days ago

Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit closer to practice (with multi-headed low rank attention), and of course, add a focus on dynamics. Thread 🧵 with some extra highlights

thumb_up_off_alt25

chat_bubble_outline1

repeat5

shareShare

Zhu Jian-Qiao

@jq_zhu

19 days ago

1/9 Thrilled to share our recent theoretical paper (with Griffiths Computational Cognitive Science Lab) on human belief updating, now published in Psychological Review! A quick 🧵:

1/9 Thrilled to share our recent theoretical paper (with <a href="/cocosci_lab/">Griffiths Computational Cognitive Science Lab</a>) on human belief updating, now published in Psychological Review! A quick 🧵:

thumb_up_off_alt42

chat_bubble_outline1

repeat13

shareShare

Songlin Yang

@songlinyang4

18 days ago

Check out log-linear attention—our latest approach to overcoming the fundamental limitation of RNNs’ constant state size, while preserving subquadratic time and space complexity

thumb_up_off_alt568

chat_bubble_outline1

repeat50

shareShare

Taiga Someya

@agiats_football

18 days ago

📝 Our #ACL2025 paper is now on arXiv! "Information Locality as an Inductive Bias for Neural Language Models" We quantify how local predictability of a language affects the learnability by neural LMs using our metric, m-local entropy. paper: arxiv.org/abs/2506.05136

thumb_up_off_alt57

chat_bubble_outline1

repeat11

shareShare

William Merrill

@lambdaviking

15 days ago

A fun project with really thorough analysis of how LLMs try and often fail to implement parsing algorithms. Bonus: find out what this all has to do with the Kalamang language from New Guinea

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Yana Veitsman

@yveitsman

14 days ago

How do architectural limitations of Transformers manifest after pretraining?

thumb_up_off_alt12

chat_bubble_outline2

repeat5

shareShare

Mark Rofin

@broccolitwit

14 days ago

In Transformer theory research, we often use tiny models and toy tasks. A straightforward criticism is that this setting is far from the giant real-world LLMs. Does this mean that the theoretical insights don’t transfer to them? Check out the new cool work investigating that! 👇

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Morris Yau

@morrisyau

10 days ago

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

thumb_up_off_alt187

chat_bubble_outline2

repeat36

shareShare

Geoffrey Irving

@geoffreyirving

7 days ago

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

thumb_up_off_alt325

chat_bubble_outline6

repeat51

shareShare

Tal Linzen

@tallinzen

3 days ago

I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering.

thumb_up_off_alt275

chat_bubble_outline12

repeat49

shareShare

Michael Hahn

@mhahn29

14 hours ago

Very excited about this work: deep results from logic shedding light on Transformers and the benefit of depth

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Michael Hahn

Gate.io

Julien Siems

William Merrill

Aryaman Arora

Manuel Gomez-Rodriguez

Michael Hanna

Zixuan Wang

Aaditya Singh

Zhu Jian-Qiao

Songlin Yang

Taiga Someya

William Merrill

Yana Veitsman

Mark Rofin

Morris Yau

Geoffrey Irving

Tal Linzen

Michael Hahn