Zhengxuan Wu (@zhengxuanzenwu) 's Twitter Profile
Zhengxuan Wu

@zhengxuanzenwu

member of technical staff @stanfordnlp, goes by zen, life is neither wind nor rain, nor clear skies

ID: 1288758678141599744

linkhttps://nlp.stanford.edu/~wuzhengx/ calendar_today30-07-2020 08:50:01

330 Tweet

1,1K Followers

633 Following

Alex Tamkin (@alextamkin) 's Twitter Profile Photo

We'll be presenting our Codebook Features work at ICML this week! Stop by our poster session tomorrow (Tues) from 1:30pm - 3pm

We'll be presenting our Codebook Features work at ICML this week! 

Stop by our poster session tomorrow (Tues) from 1:30pm - 3pm
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

🚨When building LM systems for a task, should you explore finetuning or prompt optimization? Paper w/ dilara Christopher Potts finds that you should do both! New DSPy optimizers that alternate optimizing weights & prompts can deliver up to 26% gains over just optimizing one!

🚨When building LM systems for a task, should you explore finetuning or prompt optimization?

Paper w/ <a href="/dilarafsoylu/">dilara</a> <a href="/ChrisGPotts/">Christopher Potts</a> finds that you should do both!

New DSPy optimizers that alternate optimizing weights &amp; prompts can deliver up to 26% gains over just optimizing one!
Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

Hoping to see you at ACL 2024 #ACL2024! From Christopher Potts’s group: • CausalGym: Benchmarking causal interpretability • RAVEL: Evaluating Interpretability Methods on Disentangling • I am a Strange Dataset: Metalinguistic Tests for LLMs • Mission: Impossible Language Models

Hoping to see you at <a href="/aclmeeting/">ACL 2024</a> #ACL2024!
From <a href="/ChrisGPotts/">Christopher Potts</a>’s group:
• CausalGym: Benchmarking causal interpretability
• RAVEL: Evaluating Interpretability Methods on Disentangling
• I am a Strange Dataset: Metalinguistic Tests for LLMs
• Mission: Impossible Language Models
caden (@kh4dien) 's Twitter Profile Photo

Sparse autoencoders recover a diversity of interpretable features but present an intractable problem of scale to human labelers. We build new automated pipelines to close the gap, scaling our understanding to GPT-2 and LLama-3 8b features. @goncaloSpaulo Jacob Drori Nora Belrose

Sparse autoencoders recover a diversity of interpretable features but present an intractable problem of scale to human labelers. We build new automated pipelines to close the gap, scaling our understanding to GPT-2 and LLama-3 8b features.

@goncaloSpaulo <a href="/jacobcd52/">Jacob Drori</a> <a href="/norabelrose/">Nora Belrose</a>
Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

An LLM memorization riddle: A Pythia-6.9B checkpoint generates the following Output, which occurs only 1 time in the Pile. Is this a verbatim memorization?

An LLM memorization riddle: A Pythia-6.9B checkpoint generates the following Output, which occurs only 1 time in the Pile. Is this a verbatim memorization?
AdapterHub (@adapterhub) 's Twitter Profile Photo

🎉Adapters 1.0 is here!🚀 Our open-source library for modular and parameter-efficient fine-tuning got a major upgrade! v1.0 is packed with new features (ReFT, Adapter Merging, QLoRA, ...), new models & improvements! Blog: adapterhub.ml/blog/2024/08/a… Highlights in the thread! 🧵👇

Karel D’Oosterlinck (@kareldoostrlnck) 's Twitter Profile Photo

Aligning Language Models with preferences leads to stronger and safer models (GPT3 → ChatGPT). However, preferences (RLHF) contain irrelevant signals, and alignment objectives (e.g. DPO) can actually hurt model performance. We tackle both, leading to a ~2x performance boost.

Aligning Language Models with preferences leads to stronger and safer models (GPT3 → ChatGPT). However, preferences (RLHF) contain irrelevant signals, and alignment objectives (e.g. DPO) can actually hurt model performance.

We tackle both, leading to a ~2x performance boost.
Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

.Stanford NLP Group awards at #ACL2024 ▸ Best paper award Julie Kallini ✨ et al ▸ Outstanding paper award Aryaman Arora et al ▸ Outstanding paper award Weiyan Shi et al ▸ Best societal impact award Weiyan Shi et al ▸ 10 year test of time award Christopher Manning et al Congratulations! 🥂

.<a href="/stanfordnlp/">Stanford NLP Group</a> awards at #ACL2024
▸ Best  paper award
<a href="/JulieKallini/">Julie Kallini ✨</a> et al
▸ Outstanding paper award
<a href="/aryaman2020/">Aryaman Arora</a> et al
▸ Outstanding paper award
<a href="/shi_weiyan/">Weiyan Shi</a> et al
▸ Best societal impact award
<a href="/shi_weiyan/">Weiyan Shi</a> et al
▸ 10 year test of time award
<a href="/chrmanning/">Christopher Manning</a> et al
Congratulations! 🥂
Dan Jurafsky (@jurafsky) 's Twitter Profile Photo

It's back-to-school time and so here's the Fall '24 release of draft chapters for Speech and Language Processing! web.stanford.edu/~jurafsky/slp3/

Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

The Linear Representation Hypothesis is now widely adopted despite its highly restrictive nature. Here, Csordás Róbert, Atticus Geiger, Christopher Manning & I present a counterexample to the LRH and argue for more expressive theories of interpretability: arxiv.org/abs/2408.10920

Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

Intervention-based approaches to mechanistic interpretability have progressed at an astounding rate recently. In our new paper (a major update to a 2023 ms), we provide a formal framework and show how to express many methods within this framework: arxiv.org/abs/2301.04709

Connor Shorten (@cshorten30) 's Twitter Profile Photo

I am BEYOND EXCITED to publish our interview with Krista Opsahl-Ong (Krista Opsahl-Ong) from Stanford AI Lab! 🔥 Krista is the lead author of MIPRO, short for Multi-prompt Instruction Proposal Optimizer, and one of the leading developers and scientists behind DSPy! This was such

I am BEYOND EXCITED to publish our interview with Krista Opsahl-Ong (<a href="/kristahopsalong/">Krista Opsahl-Ong</a>) from <a href="/StanfordAILab/">Stanford AI Lab</a>! 🔥

Krista is the lead author of MIPRO, short for Multi-prompt Instruction Proposal Optimizer, and one of the leading developers and scientists behind DSPy!

This was such
xuan (ɕɥɛn / sh-yen) (@xuanalogue) 's Twitter Profile Photo

Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that Micah Carroll Matija Franklin Hal Ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

Should AI be aligned with human preferences, rewards, or utility functions?

Excited to finally share a preprint that <a href="/MicahCarroll/">Micah Carroll</a> <a href="/FranklinMatija/">Matija Franklin</a> <a href="/hal_ashton/">Hal Ashton</a> &amp; I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.

🔗 Thoughts on Research Impact in AI.

Grad students often ask: how do I do research that makes a difference in the current, crowded AI space?

This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.
CLS (@chengleisi) 's Twitter Profile Photo

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.

Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas?

After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers.
Yijia Shao (@echoshao8899) 's Twitter Profile Photo

Language models today are (1) widely used in personalized contexts and (2) to build systems that interface with tools. Do they respect privacy when helping with daily tasks like emailing? Introducing PrivacyLens to evaluate if LMs know privacy norms in action at inference time!

Tristan Thrush (@tristanthrush) 's Twitter Profile Photo

Do you want to select great LLM pretraining data but don’t have 1000 H100s for a ton of mixture experiments? What about a method that requires none of your own training, matches the best known existing method, and has some nice theory? New preprint: Perplexity Correlations

Do you want to select great LLM pretraining data but don’t have 1000 H100s for a ton of mixture experiments?

What about a method that requires none of your own training, matches the best known existing method, and has some nice theory?

New preprint: Perplexity Correlations
Maheep Chaudhary | महीप चौधरी💡 (@chaudharymaheep) 's Twitter Profile Photo

🚨 New paper alert! 🚨 SAEs 👾 are a hot topic in mechanistic interpretability 🛠️, but how well do they really work? We evaluated open-source SAEs of OpenAI , Apollo Research and by Joseph Bloom on GPT-2 small and found they struggle to disentangle knowledge compared to neurons.

🚨 New paper alert! 🚨

SAEs 👾 are a hot topic in mechanistic interpretability 🛠️, but how well do they really work?

We evaluated open-source SAEs of <a href="/OpenAI/">OpenAI</a> , <a href="/apolloaisafety/">Apollo Research</a> and by <a href="/JBloomAus/">Joseph Bloom</a> on GPT-2 small and found they struggle to disentangle knowledge compared to neurons.