Gabriele Sarti (@gsarti_) Twitter Tweets • TwiCopy

Paul Bogdan

6 months ago

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

thumb_up_off_alt771

chat_bubble_outline17

repeat150

shareShare

Gabriele Sarti

@gsarti_

6 months ago

Interested in applying MI methods for circuit finding or causal variable localization? 🔎 Check out our shared task at BlackboxNLP, co-located with EMNLP 2025. Deadline for submissions: August 1st!

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

David Bau

@davidbau

6 months ago

Noam Brown OpenAI It depends on what you mean by "great research". In industry "great research" means ideas that lead to great products. In academia, great research is great *teaching*. That gets to the heart of the difference between industry and academia. My take: davidbau.com/archives/2025/…

<a href="/polynoamial/">Noam Brown</a> <a href="/OpenAI/">OpenAI</a> It depends on what you mean by "great research". In industry "great research" means ideas that lead to great products.

In academia, great research is great *teaching*. That gets to the heart of the difference between industry and academia.

My take: davidbau.com/archives/2025/…

thumb_up_off_alt22

chat_bubble_outline1

repeat2

shareShare

Koyena Pal

@kpal_koyena

6 months ago

🚨 Registration is live! 🚨 The New England Mechanistic Interpretability (NEMI) Workshop is happening August 22nd 2025 at Northeastern University! A chance for the mech interp community to nerd out on how models really work 🧠🤖 🌐 Info: nemiconf.github.io/summer25/ 📝 Register:

thumb_up_off_alt103

chat_bubble_outline2

repeat28

shareShare

BlackboxNLP

@blackboxnlp

6 months ago

One month to go! ⏰ Working on featurization methods - ways to transform LM activations to better isolate causal variables? Submit your work to the Causal Variable Localization Track of the MIB Shared Task!

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Gabriele Sarti

@gsarti_

6 months ago

The wait is over! 🎉 Our speakers for #BlackboxNLP 2025 are finally out!

thumb_up_off_alt19

chat_bubble_outline0

repeat2

shareShare

BlackboxNLP

@blackboxnlp

6 months ago

⏳ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with EMNLP 2025 Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!

⏳ Three weeks left! Submit your work to the MIB Shared Task at #BlackboxNLP, co-located with <a href="/emnlpmeeting/">EMNLP 2025</a>

Whether you're working on circuit discovery or causal variable localization, this is your chance to benchmark your method in a rigorous setup!

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

BlackboxNLP

@blackboxnlp

5 months ago

Just 10 days to go until the results submission deadline for the MIB Shared Task at #BlackboxNLP! If you're working on: 🧠 Circuit discovery 🔍 Feature attribution 🧪 Causal variable localization now’s the time to polish and submit! Join us on Discord: discord.gg/n5uwjQcxPR

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Helena Casademunt

@hcasademunt

5 months ago

Problem: Train LLM on insecure code → it becomes broadly misaligned Solution: Add safety data? What if you can't? Use interpretability! We remove misaligned concepts during finetuning to steer OOD generalization We reduce emergent misalignment 10x w/o modifying training data

thumb_up_off_alt115

chat_bubble_outline7

repeat24

shareShare

Hosein Mohebbi

@hmohebbi75

5 months ago

I’ll be on job market in early 2026, looking for research scientist or academic roles in NLP/Speech. I’ll be at #ACL2025 & giving a tutorial on #interpretability at #Interspeech2025; I’d love to chat & connect if there are any opportunities!🤗 Website: hmohebbi.github.io 🧵

thumb_up_off_alt96

chat_bubble_outline3

repeat9

shareShare

BlackboxNLP

@blackboxnlp

5 months ago

📝 Technical report guidelines are out! If you're submitting to the MIB Shared Task at #BlackboxNLP, feel free to take a look to help you prepare your report: blackboxnlp.github.io/2025/task/

thumb_up_off_alt6

chat_bubble_outline0

repeat5

shareShare

BlackboxNLP

@blackboxnlp

5 months ago

Results deadline extended by one week! Following requests from participants, we’re extending the MIB Shared Task submission deadline by one week. 🗓️ New deadline: August 8, 2025 Submit your method via the MIB leaderboard!

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Emmanuel Ameisen

@mlpowered

5 months ago

Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer. But we were missing a key bit of information: explaining why the model attends to specific concepts. Today, we do just that 🧵

thumb_up_off_alt507

chat_bubble_outline6

repeat55

shareShare

Gabriele Sarti

@gsarti_

5 months ago

Check out Reza's new work on bounded perturbations, just accepted by #COLM!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

neuronpedia

@neuronpedia

5 months ago

Today, we're releasing The Circuit Analysis Research Landscape: an interpretability post extending & open sourcing Anthropic's circuit tracing work, co-authored by Paul Jankura, Google DeepMind, Goodfire EleutherAI, and Decode Research. Here's a quick demo, details follow: ⤵️

thumb_up_off_alt323

chat_bubble_outline7

repeat63

shareShare

BlackboxNLP

@blackboxnlp

5 months ago

The report deadline was also extended to August 10th! Note that this is a final extension. We look forward to reading your reports! ✍️

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Christopher Potts

@chrisgpotts

5 months ago

For a Goodfire/Anthropic meet-up later this month, I wrote a discussion doc: Assessing skeptical views of interpretability research Spoiler: it's an incredible moment for interpetability research. The skeptical views sound like a call to action to me. Link just below.

thumb_up_off_alt299

chat_bubble_outline8

repeat23

shareShare

Zhijing Jin✈️ ICLR Singapore

@zhijingjin

5 months ago

Our "Competitions of Mechanisms" paper proposes an interesting way to interpret LLM behaviors thru how it handles multiple conflicting mechanisms. E.G., in-context knowledge vs. in-weights knowledge🧐This is an elegant philophical way of thinking --

thumb_up_off_alt267

chat_bubble_outline3

repeat33

shareShare

Gabriele Sarti

@gsarti_

5 months ago

Only 3 days left for direct submissions to #BlackboxNLP, don't miss it! 🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Gabriele Sarti

@gsarti_

5 months ago

TIL Ken Liu predicted an eerily familiar setting featuring OpenAI and sama-like characters + US-China race dynamics in his short story "The Perfect Match" from 2012.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare