neuronpedia (@neuronpedia) Twitter Tweets • TwiCopy

neuronpedia

@neuronpedia

+ Follow

open source interpretability platform 🧠🧐

ID: 1679969101203247104

linkhttp://neuronpedia.org calendar_today14-07-2023 21:40:30

33 Tweet

477 Followers

10 Following

neuronpedia

@neuronpedia

8 months ago

Announcement: we're open sourcing Neuronpedia! 🚀 This includes all our mech interp tools: the interpretability API, steering, UI, inference, autointerp, search, plus 4 TB of data - cited by 35+ research papers and used by 50+ write-ups. What you can do with OSS Neuronpedia: 🧵

thumb_up_off_alt148

chat_bubble_outline2

repeat28

shareShare

Aryaman Arora

@aryaman2020

6 months ago

i forgot to tweet about this, but the very cool people at neuronpedia graciously hosted the steering vectors we trained on AxBench for Gemma-2-2B and 9B, w/ max activating examples and interactive steering neuronpedia.org/axbench

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Daniel Scalena

@daniel_sc4

6 months ago

📢 New paper: Applied interpretability 🤝 MT personalization! We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚 SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality. 🧵1/

thumb_up_off_alt27

chat_bubble_outline2

repeat5

shareShare

Shiyang Lai

@shiyanglai

6 months ago

Our work found that semantic interference in LLMs is actually not that random. Certain polysemantic structure persist across models. This hints at something deeper: a shared representational structure that might reflect higher-order of patterns. Our paper: arxiv.org/abs/2505.11611

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Anthropic

@anthropicai

6 months ago

Researchers can use the Neuronpedia interactive interface here: neuronpedia.org/gemma-2-2b/gra… And we’ve provided an annotated walkthrough: github.com/safety-researc… This project was led by participants in our Anthropic Fellows program, in collaboration with Decode Research.

thumb_up_off_alt470

chat_bubble_outline9

repeat53

shareShare

Michael Hanna

@michaelwhanna

6 months ago

Mateusz and I are excited to announce circuit-tracer, a library that makes circuit-finding simple! Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on neuronpedia: shorturl.at/SUX2A

<a href="/mntssys/">Mateusz</a> and I are excited to announce circuit-tracer, a library that makes circuit-finding simple!

Just type in a sentence, and get out a circuit showing (some of) the features your model uses to predict the next token. Try it on <a href="/neuronpedia/">neuronpedia</a>: shorturl.at/SUX2A

thumb_up_off_alt199

chat_bubble_outline8

repeat45

shareShare

Neel Nanda

@neelnanda5

6 months ago

Fantastic to see Anthropic, in collaboration with neuronpedia, creating open source tools for studying circuits with transcoders. There's a lot of interesting work to be done I'm also very glad someone finally found a use for our Gemma Scope transcoders! Credit to Arthur Conmy

thumb_up_off_alt228

chat_bubble_outline0

repeat13

shareShare

swyx

@swyx

6 months ago

I think this is the podcast that finally interp-pilled me we snuck in a little intro featuring johnny's neuronpedia and asked about HOW IN THE HECK @anthropicai does all these insanely cracked interp visualizations for their "papers"

thumb_up_off_alt36

chat_bubble_outline2

repeat7

shareShare

Adam Karvonen

@a_karvonen

5 months ago

New Paper! Robustly Improving LLM Fairness in Realistic Settings via Interpretability We show that adding realistic details to existing bias evals triggers race and gender bias in LLMs. Prompt tuning doesn’t fix it, but interpretability-based interventions can. 🧵1/7

thumb_up_off_alt132

chat_bubble_outline4

repeat17

shareShare