Stefan Heimersheim (@sheimersheim) 's Twitter Profile
Stefan Heimersheim

@sheimersheim

Research Scientist @apolloaisafety • PhD candidate at @cambridge_astro • He/him.

ID: 2475281803

calendar_today03-05-2014 10:16:57

57 Tweet

296 Followers

174 Following

Apollo Research (@apolloaievals) 's Twitter Profile Photo

We’ve released a new mechanistic interpretability approach. We use the loss landscape to identify computationally relevant features and interactions. Then, we build a full interaction graph and interpret it. Theory: arxiv.org/abs/2405.10927 Experimental: arxiv.org/abs/2405.10928

We’ve released a new mechanistic interpretability approach. We use the loss landscape to identify computationally relevant features and interactions. Then, we build a full interaction graph and interpret it.
Theory: arxiv.org/abs/2405.10927 
Experimental: arxiv.org/abs/2405.10928
Nora (@schottkey) 's Twitter Profile Photo

1/7 Excited to share our recent project from LASR Labs! We investigated on the utility of SAE latents in language models. #MechanisticInterpretability #SAE Here's what we discovered: 🧠🔍

Jai Bhagat (@jkbhagatio) 's Twitter Profile Photo

🧵Excited to announce our work on analyzing toy models of computation in superposition (CiS) -- was fun working with Sara Molas Medina, Giorgi Giglemiani, and Stefan Heimersheim on this! ❗Main takeaway: we show that toy models in Apollo Research's APD paper are not actually performing CiS!

Luca Baroni (@luchinobaroni) 's Twitter Profile Photo

Excited to share our new paper (+ LW post): "Transformers Don't Need LayerNorm at Inference Time" We show that LayerNorm (LN) can be removed from GPT-2 models (even XL) with minimal performance loss 📄 arxiv.org/abs/2507.02559 🧵