Eric Bigelow (@ericbigelow) Twitter Tweets • TwiCopy

Eric Bigelow

@ericbigelow

+ Follow

AI interpretability + computational cognitive science. PhD student @PsychHarvard

ID: 67502027

calendar_today21-08-2009 02:43:36

108 Tweet

159 Followers

772 Following

CogInterp Workshop @ NeurIPS 2025

@coginterp

8 months ago

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣 How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/

thumb_up_off_alt62

chat_bubble_outline1

repeat19

shareShare

Goodfire

@goodfireai

4 months ago

New research: are prompting and activation steering just two sides of the same coin? Eric Bigelow Daniel Wurgaft Ekdeep Singh and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

New research: are prompting and activation steering just two sides of the same coin?

<a href="/EricBigelow/">Eric Bigelow</a> <a href="/danielwurgaft/">Daniel Wurgaft</a> <a href="/EkdeepL/">Ekdeep Singh</a> and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

thumb_up_off_alt342

chat_bubble_outline9

repeat47

shareShare

Ekdeep Singh Lubana

@ekdeepl

4 months ago

Our first Goodfire paper is out! In-context learning and activation steering are two commonly used inference-time control paradigms: we show a Bayesian Belief Update model unifies these protocols, offering a predictive theory that shows the protocols are duals of each other!

thumb_up_off_alt33

chat_bubble_outline1

repeat5

shareShare

Daniel Wurgaft

@danielwurgaft

4 months ago

Excited to see this out! 🎉 I see this paper as an exciting step towards connecting behavior and representation of belief updating, and demonstrating the benefits of top-down accounts for interpretability (aka 'coginterp') in large-scale systems! 1/3

thumb_up_off_alt136

chat_bubble_outline3

repeat16

shareShare

Andrew Lampinen

@andrewlampinen

4 months ago

Great work — his is more or less how I've thought about things, but really glad to see someone formalize it and make the argument so clearly! One question I have, though, is where we should think of those latent concepts being represented across the sequence... 1/2

thumb_up_off_alt94

chat_bubble_outline1

repeat6

shareShare

Eric Bigelow

@ericbigelow

4 months ago

Very cool work led by Ekdeep Singh Lubana Can Rager Sumedh Hindupur! I'm particularly excited about how this examines some of the implicit assumptions baked into SAEs, and proposes a new approach which builds on a different foundation.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare