CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile
CogInterp Workshop @ NeurIPS 2025

@coginterp

ID: 1941609083263766528

linkhttps://coginterp.github.io/neurips2025/ calendar_today05-07-2025 21:24:41

5 Tweet

81 Followers

0 Following

CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

For our second spotlight talk, Yifei Cao, Chonghao Cai, and Liyuan Li present work using hybrid neural-cognitive models to explain strategies in reversal learning

For our second spotlight talk, Yifei Cao, Chonghao Cai, and Liyuan Li present work using hybrid neural-cognitive models to explain strategies in reversal learning
Sonia Murthy (@soniakmurthy) 's Twitter Profile Photo

Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: arxiv.org/abs/2506.20666

CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

Ari Holtzman Ari Holtzman takes us into the mind of an LLM to help us understand how these models see the world, and what might be a good road forward to studying them

Ari Holtzman <a href="/universeinanegg/">Ari Holtzman</a> takes us into the mind of an LLM to help us understand how these models see the world, and what might be a good road forward to studying them
CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

Erin Grant Erin Grant is @NeurIPS discusses dissociations between function and representation, and asks whether representational alignment is enough for understanding deep neural networks

Erin Grant <a href="/ermgrant/">Erin Grant is @NeurIPS</a> discusses dissociations between function and representation, and asks whether representational alignment is enough for understanding deep neural networks
CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

For our third spotlight talk, Sonia Murthy Sonia Murthy @ NeurIPS25 uses probabilistic cognitive models to understand value trade-offs in LLMs that enable pragmatic reasoning about politeness in speech acts

For our third spotlight talk, Sonia Murthy <a href="/soniakmurthy/">Sonia Murthy @ NeurIPS25</a> uses probabilistic cognitive models to understand value trade-offs in LLMs that enable pragmatic reasoning about politeness in speech acts
CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

In our fourth spotlight talk, neural network legend Paul Smolensky uses symbolic programs such as production systems to understand how neural networks process symbols

In our fourth spotlight talk, neural network legend Paul Smolensky uses symbolic programs such as production systems to understand how neural networks process symbols
CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

Our final speaker Sydney Levine makes a radical proposal: building computational models of human moral judgements to use as an AI system for making moral judgements.

Our final speaker <a href="/sydneymlevine/">Sydney Levine</a> makes a radical proposal: building computational models of human moral judgements to use as an AI system for making moral judgements.
CogInterp Workshop @ NeurIPS 2025 (@coginterp) 's Twitter Profile Photo

Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky Noga Zaslavsky for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!

Our Best Paper Award goes to Nathaniel Imel and Noga Zaslavsky <a href="/NogaZaslavsky/">Noga Zaslavsky</a> for their excellent paper “Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression”!
Noga Zaslavsky (@nogazaslavsky) 's Twitter Profile Photo

Honored and thrilled that our work received the CogInterp Workshop @ NeurIPS 2025 best paper award! 💫 📄 Extended paper: arxiv.org/pdf/2509.08093 🧵 Highlights: x.com/NogaZaslavsky/… NeurIPS Conference #NeurIPS2025

Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:

Goodfire (@goodfireai) 's Twitter Profile Photo

Our last Stanford guest lecture - Ekdeep Singh is @NeurIPS on what counts as an explanation & a neuro-inspired "model systems approach" to interp Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach)