Explainable AI (@xai_research) Twitter Tweets • TwiCopy

Ercong Nie

a year ago

ACL Time @ Bangkok 🇹🇭 Our GNNavi work will be presented in the poster session at 12:30 on Aug. 14 (Wed.). Welcome to drop by and exchange with us! Looking forward to talking with people, especially those who are interested in multilingual & low-resource & LLM interpretability🤗

thumb_up_off_alt30

chat_bubble_outline0

repeat7

shareShare

Goodfire

@goodfireai

a year ago

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.

thumb_up_off_alt717

chat_bubble_outline11

repeat121

shareShare

Samuel Marks

@saprmarks

a year ago

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

thumb_up_off_alt330

chat_bubble_outline10

repeat66

shareShare

Michal Moshkovitz

@ml_theorist

a year ago

This Thursday (in 3 days), Yishay Mansour will discuss interpretable approximations — learning with interpretable models. Is it the same as regular learning? Attend the lecture to find out! 💻 Website: tverven.github.io/tiai-seminar/ Suraj Srinivas @ ICML Tim van Erven

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

FAR.AI

@farairesearch

a year ago

Chirag Agarwal Follow us for AI safety insights x.com/intent/follow?… And watch the full video youtu.be/nqZ6EiPltSo&li…

thumb_up_off_alt16

chat_bubble_outline0

repeat4

shareShare

Rohan Paul

@rohanpaul_ai

a year ago

LLMs are all circuits and patterns Nice Paper for a long weekend read - "A Primer on the Inner Workings of Transformer-based Language Models" 📌 Provides a concise intro focusing on the generative decoder-only architecture. 📌 Introduces the Transformer layer components,

thumb_up_off_alt268

chat_bubble_outline4

repeat48

shareShare

Giang Nguyen

@giangnguyen2412

a year ago

Dylan Sam Hi Dylan, it reminds me of our paper where we also train a model (model 2) on the output of another black-box model (model 1). ultimately we find that combining the outputs of model 2 and model 1 helps improve the perf significantly. openreview.net/forum?id=OcFjq…

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Tim van Erven

@tverven

a year ago

In case you missed it: here is the recording of Yishay Mansour's talk about the ability of decision trees to approximate concepts: youtu.be/uOwuho2er58 For upcoming talks, check out the seminar website: tverven.github.io/tiai-seminar/

thumb_up_off_alt16

chat_bubble_outline0

repeat4

shareShare

Apart Research

@apartresearch

a year ago

This week's Apart News brings you an *exclusive* interview with interpretability insider Myra Deng of Goodfire & revisits our Sparse Autoencoders Hackathon which featured a memorable talk from Google DeepMind's Neel Nanda.

This week's Apart News brings you an *exclusive* interview with interpretability insider <a href="/myra_deng/">Myra Deng</a> of <a href="/GoodfireAI/">Goodfire</a> & revisits our Sparse Autoencoders Hackathon which featured a memorable talk from <a href="/GoogleDeepMind/">Google DeepMind</a>'s <a href="/NeelNanda5/">Neel Nanda</a>.

thumb_up_off_alt18

chat_bubble_outline1

repeat4

shareShare

Rudy Gilman

@rgilman33

a year ago

The later features in DINO-v2 are more abstract and semantically meaningful than I'd expected from the training objectives. This neuron responds only to hugs. Nothing else, just hugs.

thumb_up_off_alt561

chat_bubble_outline9

repeat62

shareShare

Mechanistic Interpretability for Vision @ CVPR2025

@miv_cvpr2025

a year ago

🔍 Curious about what's really happening inside vision models? Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at #CVPR2025! 📢 Website: sites.google.com/view/miv-cvpr2… Meet our amazing invited speakers! #CVPR2025 #MIV25 #MechInterp #ComputerVision

🔍 Curious about what's really happening inside vision models?

Join us at the First Workshop on Mechanistic Interpretability for Vision (MIV) at <a href="/CVPR/">#CVPR2025</a>!

📢 Website: sites.google.com/view/miv-cvpr2…

Meet our amazing invited speakers!

#CVPR2025 #MIV25 #MechInterp #ComputerVision

thumb_up_off_alt57

chat_bubble_outline0

repeat13

shareShare

Explainable AI

@xai_research

a year ago

We have moved to 🦋 bluesky! Please follow over there @ XAI-Research bsky.app/profile/xai-re…

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Chirag Agarwal

@_cagarwal

a year ago

Exciting opportunity at the intersection of climate science and XAI to work on groundbreaking research in attributing extreme precipitation events with multimodal models. Check out the details and help spread the word! #ClimateAI #Postdoc #UVA #Hiring Job description:

thumb_up_off_alt17

chat_bubble_outline0

repeat7

shareShare

Explainable AI

@xai_research

10 months ago

Reminder we have moved to 🦋 Stay up to date with the latest XAI research!

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Gianluigi Lopardo

@gigilopardo

10 months ago

Hot off the press: my PhD thesis "Foundations of machine learning interpretability" is officially published! Enjoy it at theses.hal.science/tel-04917007 Explainable AI Trustworthy ML Initiative (TrustML)

thumb_up_off_alt17

chat_bubble_outline1

repeat3

shareShare

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

10 months ago

Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 1. Feature Attribution 2. Data Attribution 3. Model Component Attribution (aka Mechanistic Interpretability) arxiv.org/abs/2501.18887

thumb_up_off_alt136

chat_bubble_outline2

repeat17

shareShare

Archiki Prasad

@archikiprasad

10 months ago

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key

thumb_up_off_alt167

chat_bubble_outline5

repeat62

shareShare

Suraj Srinivas

@suuraj

10 months ago

Our Theory of Interpretable AI (tverven.github.io/tiai-seminar/) will soon celebrate its one-year anniversary! 🥳 As we step into our second year, we’d love to hear from you! What papers would you like to see discussed in our seminar in the future? 📚🔍 Tim van Erven Michal Moshkovitz

thumb_up_off_alt17

chat_bubble_outline1

repeat4

shareShare