Stefan Heimersheim (@sheimersheim) Twitter Tweets • TwiCopy

Stefan Heimersheim

@sheimersheim

+ Follow

Research Scientist @apolloaisafety • PhD candidate at @cambridge_astro • He/him.

ID: 2475281803

calendar_today03-05-2014 10:16:57

57 Tweet

296 Followers

174 Following

Andrew Carr (e/🤸)

@andrew_n_carr

3 years ago

Someone on Reddit is using stable diffusion to take selfies throughout time - here they are with the Trojan horse

thumb_up_off_alt82,82K

chat_bubble_outline219

repeat4,4K

shareShare

We’ve released a new mechanistic interpretability approach. We use the loss landscape to identify computationally relevant features and interactions. Then, we build a full interaction graph and interpret it. Theory: arxiv.org/abs/2405.10927 Experimental: arxiv.org/abs/2405.10928

thumb_up_off_alt140

chat_bubble_outline2

repeat28

shareShare

Nora

@schottkey

2 years ago

1/7 Excited to share our recent project from LASR Labs! We investigated on the utility of SAE latents in language models. #MechanisticInterpretability #SAE Here's what we discovered: 🧠🔍

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Stefan Heimersheim

@sheimersheim

2 years ago

But can it play doom?

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jai Bhagat

@jkbhagatio

10 months ago

🧵Excited to announce our work on analyzing toy models of computation in superposition (CiS) -- was fun working with Sara Molas Medina, Giorgi Giglemiani, and Stefan Heimersheim on this! ❗Main takeaway: we show that toy models in Apollo Research's APD paper are not actually performing CiS!

thumb_up_off_alt4

chat_bubble_outline2

repeat2

shareShare

Luca Baroni

@luchinobaroni

9 months ago

Excited to share our new paper (+ LW post): "Transformers Don't Need LayerNorm at Inference Time" We show that LayerNorm (LN) can be removed from GPT-2 models (even XL) with minimal performance loss 📄 arxiv.org/abs/2507.02559 🧵

thumb_up_off_alt2

chat_bubble_outline2

repeat2

shareShare