Michael Hanna (@michaelwhanna) 's Twitter Profile
Michael Hanna

@michaelwhanna

PhD student at the University of Amsterdam / ILLC, interested in computational linguistics and (mechanistic) interpretability

ID: 1163241332904800262

linkhttp://hannamw.github.io calendar_today19-08-2019 00:08:47

52 Tweet

404 Followers

382 Following

Tal Haklay (@tal_haklay) 's Twitter Profile Photo

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored! We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
Aaron Mueller (@amuuueller) 's Twitter Profile Photo

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!
Michael Hanna (@michaelwhanna) 's Twitter Profile Photo

I'll be presenting this in person at NAACL HLT 2025, tomorrow at 11am in Ballroom C! Come on by - I'd love to chat with folks about this and all things interp / cog sci!

Anthropic (@anthropicai) 's Twitter Profile Photo

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

The methods we used to trace the thoughts of Claude are now open to the public! Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.

The methods we used to trace the thoughts of Claude are now open to the public!

Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer.
Jack Lindsey (@jack_w_lindsey) 's Twitter Profile Photo

We’re releasing an open-source library and public interactive interface for tracing the internal “thoughts” of a language model. Now anyone can explore the inner workings of LLMs — and it only takes seconds!

We’re releasing an open-source library and public interactive interface for tracing the internal “thoughts” of a language model. Now anyone can explore the inner workings of LLMs — and it only takes seconds!