Zhengxuan Wu (@zhengxuanzenwu) Twitter Tweets • TwiCopy

Aryaman Arora

@aryaman2020

2 months ago

pyvene has a docs website now stanfordnlp.github.io/pyvene

Alex Tamkin

@alextamkin

2 months ago

We'll be presenting our Codebook Features work at ICML this week! Stop by our poster session tomorrow (Tues) from 1:30pm - 3pm

thumb_up_off_alt59

chat_bubble_outline1

repeat8

shareShare

🚨When building LM systems for a task, should you explore finetuning or prompt optimization? Paper w/ dilara Christopher Potts finds that you should do both! New DSPy optimizers that alternate optimizing weights & prompts can deliver up to 26% gains over just optimizing one!

🚨When building LM systems for a task, should you explore finetuning or prompt optimization?

Paper w/ <a href="/dilarafsoylu/">dilara</a> <a href="/ChrisGPotts/">Christopher Potts</a> finds that you should do both!

New DSPy optimizers that alternate optimizing weights & prompts can deliver up to 26% gains over just optimizing one!

Stanford NLP Group

@stanfordnlp

2 months ago

Hoping to see you at ACL 2024 #ACL2024! From Christopher Potts’s group: • CausalGym: Benchmarking causal interpretability • RAVEL: Evaluating Interpretability Methods on Disentangling • I am a Strange Dataset: Metalinguistic Tests for LLMs • Mission: Impossible Language Models

Hoping to see you at <a href="/aclmeeting/">ACL 2024</a> #ACL2024!
From <a href="/ChrisGPotts/">Christopher Potts</a>’s group:
• CausalGym: Benchmarking causal interpretability
• RAVEL: Evaluating Interpretability Methods on Disentangling
• I am a Strange Dataset: Metalinguistic Tests for LLMs
• Mission: Impossible Language Models

caden

@kh4dien

2 months ago

Sparse autoencoders recover a diversity of interpretable features but present an intractable problem of scale to human labelers. We build new automated pipelines to close the gap, scaling our understanding to GPT-2 and LLama-3 8b features. @goncaloSpaulo Jacob Drori Nora Belrose

Christopher Potts

@chrisgpotts

2 months ago

An LLM memorization riddle: A Pythia-6.9B checkpoint generates the following Output, which occurs only 1 time in the Pile. Is this a verbatim memorization?

Mor Geva

@megamor2

2 months ago

Attending #ACL2024? Come hear about our recent work from TAU/Google (with collaborators!) about interpretability, knowledge, and reasoning in LLMs! Sohee Yang @ ACL 2024 Jesujoba Alabi Alon Jacovi Gal Yona Zhengxuan Wu

thumb_up_off_alt69

chat_bubble_outline2

repeat7

shareShare

AdapterHub

@adapterhub

a month ago

🎉Adapters 1.0 is here!🚀 Our open-source library for modular and parameter-efficient fine-tuning got a major upgrade! v1.0 is packed with new features (ReFT, Adapter Merging, QLoRA, ...), new models & improvements! Blog: adapterhub.ml/blog/2024/08/a… Highlights in the thread! 🧵👇

thumb_up_off_alt47

chat_bubble_outline2

repeat8

shareShare

Karel D’Oosterlinck

@kareldoostrlnck

a month ago

Aligning Language Models with preferences leads to stronger and safer models (GPT3 → ChatGPT). However, preferences (RLHF) contain irrelevant signals, and alignment objectives (e.g. DPO) can actually hurt model performance. We tackle both, leading to a ~2x performance boost.

Stanford NLP Group

@stanfordnlp

a month ago

.Stanford NLP Group awards at #ACL2024 ▸ Best paper award Julie Kallini ✨ et al ▸ Outstanding paper award Aryaman Arora et al ▸ Outstanding paper award Weiyan Shi et al ▸ Best societal impact award Weiyan Shi et al ▸ 10 year test of time award Christopher Manning et al Congratulations! 🥂

.<a href="/stanfordnlp/">Stanford NLP Group</a> awards at #ACL2024
▸ Best paper award
<a href="/JulieKallini/">Julie Kallini ✨</a> et al
▸ Outstanding paper award
<a href="/aryaman2020/">Aryaman Arora</a> et al
▸ Outstanding paper award
<a href="/shi_weiyan/">Weiyan Shi</a> et al
▸ Best societal impact award
<a href="/shi_weiyan/">Weiyan Shi</a> et al
▸ 10 year test of time award
<a href="/chrmanning/">Christopher Manning</a> et al
Congratulations! 🥂

thumb_up_off_alt330

chat_bubble_outline11

repeat35

shareShare

Dan Jurafsky

@jurafsky

a month ago

It's back-to-school time and so here's the Fall '24 release of draft chapters for Speech and Language Processing! web.stanford.edu/~jurafsky/slp3/

Christopher Potts

@chrisgpotts

a month ago

The Linear Representation Hypothesis is now widely adopted despite its highly restrictive nature. Here, Csordás Róbert, Atticus Geiger, Christopher Manning & I present a counterexample to the LRH and argue for more expressive theories of interpretability: arxiv.org/abs/2408.10920

thumb_up_off_alt282

chat_bubble_outline10

repeat65

shareShare

Christopher Potts

@chrisgpotts

a month ago

Intervention-based approaches to mechanistic interpretability have progressed at an astounding rate recently. In our new paper (a major update to a 2023 ms), we provide a formal framework and show how to express many methods within this framework: arxiv.org/abs/2301.04709

Connor Shorten

@cshorten30

a month ago

I am BEYOND EXCITED to publish our interview with Krista Opsahl-Ong (Krista Opsahl-Ong) from Stanford AI Lab! 🔥 Krista is the lead author of MIPRO, short for Multi-prompt Instruction Proposal Optimizer, and one of the leading developers and scientists behind DSPy! This was such

I am BEYOND EXCITED to publish our interview with Krista Opsahl-Ong (<a href="/kristahopsalong/">Krista Opsahl-Ong</a>) from <a href="/StanfordAILab/">Stanford AI Lab</a>! 🔥

Krista is the lead author of MIPRO, short for Multi-prompt Instruction Proposal Optimizer, and one of the leading developers and scientists behind DSPy!

This was such

thumb_up_off_alt141

chat_bubble_outline16

repeat41

shareShare

xuan (ɕɥɛn / sh-yen)

@xuanalogue

24 days ago

Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that Micah Carroll Matija Franklin Hal Ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

Should AI be aligned with human preferences, rewards, or utility functions?

Excited to finally share a preprint that <a href="/MicahCarroll/">Micah Carroll</a> <a href="/FranklinMatija/">Matija Franklin</a> <a href="/hal_ashton/">Hal Ashton</a> & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

thumb_up_off_alt673

chat_bubble_outline21

repeat123

shareShare

Omar Khattab

@lateinteraction

23 days ago

🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.

thumb_up_off_alt873

chat_bubble_outline16

repeat186

shareShare

CLS

@chengleisi

18 days ago

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

thumb_up_off_alt954

chat_bubble_outline27

repeat178

shareShare

Yijia Shao

@echoshao8899

17 days ago

Language models today are (1) widely used in personalized contexts and (2) to build systems that interface with tools. Do they respect privacy when helping with daily tasks like emailing? Introducing PrivacyLens to evaluate if LMs know privacy norms in action at inference time!

Tristan Thrush

@tristanthrush

17 days ago

Do you want to select great LLM pretraining data but don’t have 1000 H100s for a ton of mixture experiments? What about a method that requires none of your own training, matches the best known existing method, and has some nice theory? New preprint: Perplexity Correlations

thumb_up_off_alt411

chat_bubble_outline10

repeat71

shareShare

Maheep Chaudhary | महीप चौधरी💡

@chaudharymaheep

17 days ago

🚨 New paper alert! 🚨 SAEs 👾 are a hot topic in mechanistic interpretability 🛠️, but how well do they really work? We evaluated open-source SAEs of OpenAI , Apollo Research and by Joseph Bloom on GPT-2 small and found they struggle to disentangle knowledge compared to neurons.

🚨 New paper alert! 🚨

SAEs 👾 are a hot topic in mechanistic interpretability 🛠️, but how well do they really work?

We evaluated open-source SAEs of <a href="/OpenAI/">OpenAI</a> , <a href="/apolloaisafety/">Apollo Research</a> and by <a href="/JBloomAus/">Joseph Bloom</a> on GPT-2 small and found they struggle to disentangle knowledge compared to neurons.