Michael Hanna (@michaelwhanna) Twitter Tweets • TwiCopy

Michael Hanna

@michaelwhanna

+ Follow

PhD student at the University of Amsterdam / ILLC, interested in computational linguistics and (mechanistic) interpretability

ID: 1163241332904800262

linkhttp://hannamw.github.io calendar_today19-08-2019 00:08:47

52 Tweet

404 Followers

382 Following

Tal Haklay

@tal_haklay

9 months ago

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored! We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

thumb_up_off_alt294

chat_bubble_outline7

repeat44

shareShare

Aaron Mueller

@amuuueller

7 months ago

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

thumb_up_off_alt163

chat_bubble_outline2

repeat37

shareShare

Michael Hanna

@michaelwhanna

7 months ago

I'll be presenting this in person at NAACL HLT 2025, tomorrow at 11am in Ballroom C! Come on by - I'd love to chat with folks about this and all things interp / cog sci!

thumb_up_off_alt37

chat_bubble_outline0

repeat0

shareShare

Anthropic

@anthropicai

6 months ago

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

thumb_up_off_alt4,4K

chat_bubble_outline103

repeat576

shareShare

Emmanuel Ameisen

@mlpowered

6 months ago

The methods we used to trace the thoughts of Claude are now open to the public! Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat181

shareShare

Jack Lindsey

@jack_w_lindsey

6 months ago

We’re releasing an open-source library and public interactive interface for tracing the internal “thoughts” of a language model. Now anyone can explore the inner workings of LLMs — and it only takes seconds!

thumb_up_off_alt298

chat_bubble_outline4

repeat37

shareShare

Goodfire

@goodfireai

6 months ago

New research update! We replicated Anthropic's circuit tracing methods to test if they can recover a known, simple transformer mechanism.

New research update! We replicated <a href="/AnthropicAI/">Anthropic</a>'s circuit tracing methods to test if they can recover a known, simple transformer mechanism.

thumb_up_off_alt479

chat_bubble_outline2

repeat53

shareShare