Aaron Mueller (@amuuueller) 's Twitter Profile
Aaron Mueller

@amuuueller

Postdoc with @boknilev/@davidbau ≡ Incoming Asst. Prof. in CS at @BU_Tweets ≡ Interested in #NLProc, interpretability, and evaluation ≡ Formerly: PhD @jhuclsp

ID: 3743366715

linkhttp://aaronmueller.github.io calendar_today22-09-2015 23:04:12

244 Tweet

1,1K Takipçi

689 Takip Edilen

Yanai Elazar (@yanaiela) 's Twitter Profile Photo

💡 New ICLR paper! 💡 "On Linear Representations and Pretraining Data Frequency in Language Models": We provide an explanation for when & why linear representations form in large (or small) language models. Led by Jack Merullo , w/ Noah A. Smith & Sarah Wiegreffe

💡 New ICLR paper! 💡
"On Linear Representations and Pretraining Data Frequency in Language Models":

We provide an explanation for when & why linear representations form in large (or small) language models.

Led by <a href="/jack_merullo_/">Jack Merullo</a> , w/ <a href="/nlpnoah/">Noah A. Smith</a> &amp; <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>
babyLM (@babylmchallenge) 's Twitter Profile Photo

Close your books, test time! The evaluation pipelines are out, baselines are released and the challenge is on. There is still time to join and we are excited to learn from you on pretraining and the gaps between humans and models. *Don't forget to fast-eval on checkpoints

Close your books, test time!
The evaluation pipelines are out, baselines are released and the challenge is on.
There is still time to join and we are excited to learn from you on pretraining and the gaps between humans and models.

*Don't forget to fast-eval on checkpoints
Ethan Gotlieb Wilcox (@wegotlieb) 's Twitter Profile Photo

📣Paper Update 📣It’s bigger! It’s better! Even if the language models aren’t. 🤖New version of “Bigger is not always Better: The importance of human-scale language modeling for psycholinguistics” osf.io/preprints/psya…

Yonatan Belinkov (@boknilev) 's Twitter Profile Photo

BlackboxNLP will be co-located with #EMNLP2025 in Suzhou this November! 📷This edition will feature a new shared task on circuits/causal variable localization in LMs, details: blackboxnlp.github.io/2025/task If you're into mech interp and care about evaluation, please submit!

Tomer Ashuach (@tomerashuach) 's Twitter Profile Photo

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/Yonatan Belinkov Martin Tutek 1/8

🚨New paper at #ACL2025 Findings!
REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space.
LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. 
We propose a surgical method to unlearn it.
🧵👇w/<a href="/boknilev/">Yonatan Belinkov</a> <a href="/mtutek/">Martin Tutek</a>
1/8
Joe Stacey (@_joestacey_) 's Twitter Profile Photo

We have a new paper up on arXiv! 🥳🪇 The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. Here's a 60 second rundown of what we found!

We have a new paper up on arXiv! 🥳🪇

The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples. 

Here's a 60 second rundown of what we found!
David Bau (@davidbau) 's Twitter Profile Photo

Dear MAGA friends, I have been worrying about STEM in the US a lot, because right now the Senate is writing new laws that cut 75% of the STEM budget in the US. Sorry for the long post, but the issue is really important, and I want to share what I know about it. The entire

Joshua Rozner (@jsrozner) 's Twitter Profile Photo

BabyLMs first constructions: new study on usage-based language acquisition in LMs w/ Leonie Weissweiler, Cory Shain. Simple interventions show that LMs trained on cognitively plausible data acquire diverse constructions (cxns) babyLM 🧵

Jackson Petty (@jowenpetty) 's Twitter Profile Photo

How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.

How well can LLMs understand tasks with complex sets of instructions? We investigate through the lens of RELIC: REcognizing (formal) Languages In-Context, finding a significant overhang between what LLMs are able to do theoretically and how well they put this into practice.
Nikhil Prakash (@nikhil07prakash) 's Twitter Profile Photo

How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it
Aaron Mueller (@amuuueller) 's Twitter Profile Photo

If you're at #ICML2025, chat with me, Sarah Wiegreffe, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark. We're planning to keep this a living benchmark; come by and share your ideas/hot takes!

If you're at #ICML2025, chat with me, <a href="/sarahwiegreffe/">Sarah Wiegreffe</a>, Atticus, and others at our poster 11am - 1:30pm at East #1205! We're establishing a 𝗠echanistic 𝗜nterpretability 𝗕enchmark.

We're planning to keep this a living benchmark; come by and share your ideas/hot takes!