Aaron Mueller (@amuuueller) Twitter Tweets • TwiCopy

Aaron Mueller

@amuuueller

+ Follow

Postdoc with @boknilev/@davidbau ≡ Incoming Asst. Prof. in CS at @BU_Tweets ≡ Interested in #NLProc, interpretability, and evaluation ≡ Formerly: PhD @jhuclsp

ID: 3743366715

linkhttp://aaronmueller.github.io calendar_today22-09-2015 23:04:12

244 Tweet

1,1K Followers

689 Following

Yanai Elazar

@yanaiela

7 months ago

💡 New ICLR paper! 💡 "On Linear Representations and Pretraining Data Frequency in Language Models": We provide an explanation for when & why linear representations form in large (or small) language models. Led by Jack Merullo , w/ Noah A. Smith & Sarah Wiegreffe

thumb_up_off_alt213

chat_bubble_outline6

repeat42

shareShare

Aaron Mueller

@amuuueller

7 months ago

Presenting sparse feature circuits today at 3pm-5:30pm! Come say hi at poster #495

thumb_up_off_alt53

chat_bubble_outline1

repeat8

shareShare

babyLM

@babylmchallenge

6 months ago

Close your books, test time! The evaluation pipelines are out, baselines are released and the challenge is on. There is still time to join and we are excited to learn from you on pretraining and the gaps between humans and models. *Don't forget to fast-eval on checkpoints

thumb_up_off_alt13

chat_bubble_outline1

repeat5

shareShare

Ethan Gotlieb Wilcox

@wegotlieb

6 months ago

📣Paper Update 📣It’s bigger! It’s better! Even if the language models aren’t. 🤖New version of “Bigger is not always Better: The importance of human-scale language modeling for psycholinguistics” osf.io/preprints/psya…

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

Yonatan Belinkov

@boknilev

6 months ago

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

thumb_up_off_alt71

chat_bubble_outline1

repeat20

shareShare

Tal Haklay

@tal_haklay

6 months ago

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉 Huge thanks to my collaborators🙏 Hadas Orgad @ ICML David Bau Aaron Mueller Yonatan Belinkov See you in Vienna! 🇦🇹 #ACL2025 ACL 2025

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉

Huge thanks to my collaborators🙏
<a href="/OrgadHadas/">Hadas Orgad @ ICML</a> <a href="/davidbau/">David Bau</a> <a href="/amuuueller/">Aaron Mueller</a> <a href="/boknilev/">Yonatan Belinkov</a>

See you in Vienna! 🇦🇹 #ACL2025 <a href="/aclmeeting/">ACL 2025</a>

thumb_up_off_alt185

chat_bubble_outline1

repeat30

shareShare

Tomer Ashuach

@tomerashuach

5 months ago

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/Yonatan Belinkov Martin Tutek 1/8

thumb_up_off_alt69

chat_bubble_outline1

repeat17

shareShare

Joe Stacey

@_joestacey_

5 months ago

We have a new paper up on arXiv! 🥳🪇 The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. Here's a 60 second rundown of what we found!

thumb_up_off_alt77

chat_bubble_outline3

repeat16

shareShare

David Bau

@davidbau

5 months ago

Dear MAGA friends, I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cut 75% of the STEM budget in the US. Sorry for the long post, but the issue is really important, and I want to share what I know about it. The entire

thumb_up_off_alt466

chat_bubble_outline23

repeat74

shareShare

Joshua Rozner

@jsrozner

5 months ago

BabyLMs first constructions: new study on usage-based language acquisition in LMs w/ Leonie Weissweiler, Cory Shain. Simple interventions show that LMs trained on cognitively plausible data acquire diverse constructions (cxns) babyLM 🧵

thumb_up_off_alt19

chat_bubble_outline1

repeat6

shareShare

Jackson Petty

@jowenpetty

5 months ago

How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.

thumb_up_off_alt99

chat_bubble_outline3

repeat21

shareShare

Nikhil Prakash

@nikhil07prakash

5 months ago

How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it

thumb_up_off_alt558

chat_bubble_outline9

repeat92

shareShare

Aaron Mueller

@amuuueller

4 months ago

If you're at #ICML2025, chat with me, Sarah Wiegreffe, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark. We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

If you're at #ICML2025, chat with me, <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

thumb_up_off_alt39

chat_bubble_outline0

repeat4

shareShare

Hadas Orgad

@orgadhadas

4 months ago

We're presenting the Mechanistic Interpretability Benchmark (MIB) now! Come and chat - East 1205. Project led by Aaron Mueller Atticus Geiger Sarah Wiegreffe

We're presenting the Mechanistic Interpretability Benchmark (MIB) now! Come and chat - East 1205.
Project led by <a href="/amuuueller/">Aaron Mueller</a> <a href="/AtticusGeiger/">Atticus Geiger</a> <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>

thumb_up_off_alt98

chat_bubble_outline1

repeat8

shareShare