Eric Bigelow (@ericbigelow) 's Twitter Profile
Eric Bigelow

@ericbigelow

AI interpretability + computational cognitive science. PhD student @PsychHarvard

ID: 67502027

calendar_today21-08-2009 02:43:36

108 Tweet

159 Followers

772 Following

CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

We’re excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! πŸ“£ How can we interpret the algorithms and representations underlying complex behavior in deep learning models? 🌐 coginterp.github.io/neurips2025/ 1/

Goodfire (@goodfireai) 's Twitter Profile Photo

New research: are prompting and activation steering just two sides of the same coin? Eric Bigelow Daniel Wurgaft Ekdeep Singh and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

New research: are prompting and activation steering just two sides of the same coin?

<a href="/EricBigelow/">Eric Bigelow</a> <a href="/danielwurgaft/">Daniel Wurgaft</a> <a href="/EkdeepL/">Ekdeep Singh</a> and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)
Ekdeep Singh Lubana (@ekdeepl) 's Twitter Profile Photo

Our first Goodfire paper is out! In-context learning and activation steering are two commonly used inference-time control paradigms: we show a Bayesian Belief Update model unifies these protocols, offering a predictive theory that shows the protocols are duals of each other!

Daniel Wurgaft (@danielwurgaft) 's Twitter Profile Photo

Excited to see this out! πŸŽ‰ I see this paper as an exciting step towards connecting behavior and representation of belief updating, and demonstrating the benefits of top-down accounts for interpretability (aka 'coginterp') in large-scale systems! 1/3

Excited to see this out! πŸŽ‰

I see this paper as an exciting step towards connecting behavior and representation of belief updating, and demonstrating the benefits of top-down accounts for interpretability (aka 'coginterp') in large-scale systems!

1/3
Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

Great work β€” his is more or less how I've thought about things, but really glad to see someone formalize it and make the argument so clearly! One question I have, though, is where we should think of those latent concepts being represented across the sequence... 1/2

Eric Bigelow (@ericbigelow) 's Twitter Profile Photo

Very cool work led by Ekdeep Singh Lubana Can Rager Sumedh Hindupur! I'm particularly excited about how this examines some of the implicit assumptions baked into SAEs, and proposes a new approach which builds on a different foundation.