Explainable AI (@xai_research) 's Twitter Profile
Explainable AI

@xai_research

Moved to πŸ¦‹! Explainable/Interpretable AI researchers and enthusiasts - DM to join the XAI Slack! Twitter and Slack maintained by @NickKroeger1

ID: 1508524378224427008

calendar_today28-03-2022 19:20:26

1,1K Tweet

2,2K Followers

783 Following

Ercong Nie (@nielklug) 's Twitter Profile Photo

ACL Time @ Bangkok πŸ‡ΉπŸ‡­ Our GNNavi work will be presented in the poster session at 12:30 on Aug. 14 (Wed.). Welcome to drop by and exchange with us! Looking forward to talking with people, especially those who are interested in multilingual & low-resource & LLM interpretabilityπŸ€—

ACL Time @ Bangkok πŸ‡ΉπŸ‡­

Our GNNavi work will be presented in the poster session at 12:30 on Aug. 14 (Wed.). Welcome to drop by and exchange with us!

Looking forward to talking with people, especially those who are interested in multilingual & low-resource & LLM interpretabilityπŸ€—
Goodfire (@goodfireai) 's Twitter Profile Photo

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.
Samuel Marks (@saprmarks) 's Twitter Profile Photo

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
Michal Moshkovitz (@ml_theorist) 's Twitter Profile Photo

This Thursday (in 3 days), Yishay Mansour will discuss interpretable approximations β€” learning with interpretable models. Is it the same as regular learning? Attend the lecture to find out! πŸ’» Website: tverven.github.io/tiai-seminar/ Suraj Srinivas @ ICML Tim van Erven

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

LLMs are all circuits and patterns Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-based Language Models" πŸ“Œ Provides a concise intro focusing on the generative decoder-only architecture. πŸ“Œ Introduces the Transformer layer components,

LLMs are all circuits and patterns

Nice Paper for a long weekend read  - "A Primer on the Inner Workings of Transformer-based Language Models"

πŸ“Œ Provides a concise intro focusing on the generative decoder-only architecture.

πŸ“Œ Introduces the Transformer layer components,
Giang Nguyen (@giangnguyen2412) 's Twitter Profile Photo

Dylan Sam Hi Dylan, it reminds me of our paper where we also train a model (model 2) on the output of another black-box model (model 1). ultimately we find that combining the outputs of model 2 and model 1 helps improve the perf significantly. openreview.net/forum?id=OcFjq…

Tim van Erven (@tverven) 's Twitter Profile Photo

In case you missed it: here is the recording of Yishay Mansour's talk about the ability of decision trees to approximate concepts: youtu.be/uOwuho2er58 For upcoming talks, check out the seminar website: tverven.github.io/tiai-seminar/

Apart Research (@apartresearch) 's Twitter Profile Photo

This week's Apart News brings you an *exclusive* interview with interpretability insider Myra Deng of Goodfire & revisits our Sparse Autoencoders Hackathon which featured a memorable talk from Google DeepMind's Neel Nanda.

This week's Apart News brings you an *exclusive* interview with interpretability insider <a href="/myra_deng/">Myra Deng</a> of <a href="/GoodfireAI/">Goodfire</a> &amp; revisits our Sparse Autoencoders Hackathon which featured a memorable talk from <a href="/GoogleDeepMind/">Google DeepMind</a>'s <a href="/NeelNanda5/">Neel Nanda</a>.
Rudy Gilman (@rgilman33) 's Twitter Profile Photo

The later features in DINO-v2 are more abstract and semantically meaningful than I'd expected from the training objectives. This neuron responds only to hugs. Nothing else, just hugs.

Mechanistic Interpretability for Vision @ CVPR2025 (@miv_cvpr2025) 's Twitter Profile Photo

πŸ” Curious about what's really happening inside vision models? Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at #CVPR2025! πŸ“’ Website: sites.google.com/view/miv-cvpr2… Meet our amazing invited speakers! #CVPR2025 #MIV25 #MechInterp #ComputerVision

πŸ” Curious about what's really happening inside vision models?

Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at <a href="/CVPR/">#CVPR2025</a>!

πŸ“’ Website: sites.google.com/view/miv-cvpr2…

Meet our amazing invited speakers!

 #CVPR2025 #MIV25 #MechInterp #ComputerVision
Chirag Agarwal (@_cagarwal) 's Twitter Profile Photo

Exciting opportunity at the intersection of climate science and XAI to work on groundbreaking research in attributing extreme precipitation events with multimodal models. Check out the details and help spread the word! #ClimateAI #Postdoc #UVA #Hiring Job description:

π™·πš’πš–πšŠ π™»πšŠπš”πš”πšŠπš›πšŠπš“πšž (@hima_lakkaraju) 's Twitter Profile Photo

Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 1. Feature Attribution 2. Data Attribution 3. Model Component Attribution (aka Mechanistic Interpretability) arxiv.org/abs/2501.18887

Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 

1. Feature Attribution
2. Data Attribution
3. Model Component Attribution (aka Mechanistic Interpretability) 

arxiv.org/abs/2501.18887
Archiki Prasad (@archikiprasad) 's Twitter Profile Photo

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨
which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests.

UTGen+UTDebug improve LLM-based code debugging by addressing 3 key
Suraj Srinivas (@suuraj) 's Twitter Profile Photo

Our Theory of Interpretable AI (tverven.github.io/tiai-seminar/) will soon celebrate its one-year anniversary! πŸ₯³ As we step into our second year, we’d love to hear from you! What papers would you like to see discussed in our seminar in the future? πŸ“šπŸ” Tim van Erven Michal Moshkovitz