Yichen (Zach) Wang (@yichenzw) 's Twitter Profile
Yichen (Zach) Wang

@yichenzw

Incoming NLP Ph.D. @UChicagoCI | Interning @UWNLP @Tsvetshop (@BerkeleyNLP before) | Honored CS BS @XJTU1896 24’

ID: 1626708787750109184

linkhttp://yichenzw.com calendar_today17-02-2023 22:23:06

75 Tweet

258 Followers

307 Following

Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile Photo

🤖 LLMs are powerful, but their "one-size-fits-all" safety alignment limits flexibility. Safety standards vary across cultures and users—what’s safe in one context might not be in another. 🌍 We propose ✨Controllable Safety Alignment✨ for inference-time safety adaptation! 🧵👇

🤖 LLMs are powerful, but their "one-size-fits-all" safety alignment limits flexibility. Safety standards vary across cultures and users—what’s safe in one context might not be in another. 🌍

We propose ✨Controllable Safety Alignment✨ for inference-time safety adaptation! 🧵👇
Liam Dugan (@liamdugan_) 's Twitter Profile Photo

The deadline to submit a detector to the RAID shared task at COLING 2025 has been extended to November 2nd! There's plenty of time left to join the task + submission is quick and easy. Check out the Github for more info! github.com/liamdugan/COLI…

Mina Lee (@minalee__) 's Twitter Profile Photo

Interested in writing with AI? ✍️ Please apply to be a **postdoc** in my group through the UChicago DSI Scholars program! 🤠 Research in my group: minalee-research.github.io/research.html Application: datascience.uchicago.edu/research/postd… (review begins on Dec 6)

Chenghao Yang (@chrome1996) 's Twitter Profile Photo

Happy Thanksgiving! Inspired by many great bloggers Sasha Rush Yao Fu, I made a tutorial about the "inference-time compute" tech showcased by O1. I incorporate insights from Sasha's great talk and ongoing O1 replications. Video: youtu.be/_Bw5o55SRL8. Feedback welcome!

Rock Pang (@rockpang6) 's Twitter Profile Photo

🤔Interested in how #HCI thinks about using #LLMs, or looking to understand best practices for human-LLM interaction? 🚨🚨New paper: Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review

Yuntian Deng (@yuntiandeng) 's Twitter Profile Photo

For those curious about how o3-mini performs on multi-digit multiplication, here's the result. It does much better than o1 but still struggles past 13×13. (Same evaluation setup as before, but with 40 test examples per cell.)

For those curious about how o3-mini performs on multi-digit multiplication, here's the result. It does much better than o1 but still struggles past 13×13. (Same evaluation setup as before, but with 40 test examples per cell.)
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

Abe Hou (@abe_hou) 's Twitter Profile Photo

👁️Recent works use LLMs for social simulations—but can these agents help shape effective policies? 💥Our new paper tackles a bold question many have wondered about: Can generative agent societies simulate to inform public health policy? 🔗: arxiv.org/abs/2503.09639

Xuandong Zhao (@xuandongzhao) 's Twitter Profile Photo

🚨 New Paper Alert 🚨 Are you really getting the model you paid for? In our latest work, we uncover a critical trust gap in LLM APIs—and propose methods to audit for covert model substitution. 🕵️‍♂️

🚨 New Paper Alert 🚨

Are you really getting the model you paid for?

In our latest work, we uncover a critical trust gap in LLM APIs—and propose methods to audit for covert model substitution. 🕵️‍♂️
Dang Nguyen (@divingwithorcas) 's Twitter Profile Photo

1/n You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?

Jack Jingyu Zhang @ NAACL🌵 (@jackjingyuzhang) 's Twitter Profile Photo

Our Controllable Safety Alignment paper will be presented at #ICLR2025 this week in Singapore 🇸🇬! We've release our code and the human-authored CoSApien👥 dataset: 👉 aka.ms/controllable-s… Watch the short video summary here: 🎬 youtube.com/watch?v=kDioFn…

Xiao Pu (@xiaosophiapu) 's Twitter Profile Photo

🧠 Reasoning models often overthink. 🚀 In our new paper, we show: 1️⃣ Two overthinking scores. 2️⃣ DUMB500 — a benchmark of extremely easy questions. 3️⃣ THOUGHT TERMINATOR — a decoding method that reduces token waste by up to 90%, often improving accuracy. Details below 👇

🧠 Reasoning models often overthink.
🚀 In our new paper, we show: 
1️⃣ Two overthinking scores.
2️⃣ DUMB500 — a benchmark of extremely easy questions.
3️⃣ THOUGHT TERMINATOR — a decoding method that reduces token waste by up to 90%, often improving accuracy.
Details below 👇
Kevin Yang (@kevinyang41) 's Twitter Profile Photo

Will be at NAACL next week, excited to share two of our papers: FACTTRACK: Time-Aware World State Tracking in Story Outlines arxiv.org/abs/2407.16347 THOUGHTSCULPT: Reasoning with Intermediate Revision and Search arxiv.org/abs/2404.05966 Shoutout to first authors Zhiheng LYU and

William Merrill (@lambdaviking) 's Twitter Profile Photo

Excited to announce I'll be starting as an assistant professor at TTIC for fall 2026! In the meantime, I'll be graduating and hanging around Ai2 in Seattle🏔️

Excited to announce I'll be starting as an assistant professor at <a href="/TTIC_Connect/">TTIC</a> for fall 2026!

In the meantime, I'll be graduating and hanging around Ai2 in Seattle🏔️
Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

📣Thrilled to announce I’ll join Carnegie Mellon University (CMU Engineering & Public Policy & Language Technologies Institute | @CarnegieMellon) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at AI at Meta FAIR in SF, working with Kamalika Chaudhuri’s amazing team on privacy, security, and reasoning in LLMs!

📣Thrilled to announce I’ll join Carnegie Mellon University (<a href="/CMU_EPP/">CMU Engineering & Public Policy</a> &amp; <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a>) as an Assistant Professor starting Fall 2026!

Until then, I’ll be a Research Scientist at <a href="/AIatMeta/">AI at Meta</a> FAIR in SF, working with <a href="/kamalikac/">Kamalika Chaudhuri</a>’s amazing team on privacy, security, and reasoning in LLMs!
Harvey Yiyun Fu (@harveyiyun) 's Twitter Profile Photo

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing? 🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents. paper:

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents.

paper:
Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

New benchmark! LLMs can retrieve bits of information from ridiculously long contexts (needle-in-a-haystack) but they can't tell what's missing from relatively short documents (AbsenceBench). We can't trust LLMs to annotate or judge documents if they can't see negative space!

Chenghao Yang (@chrome1996) 's Twitter Profile Photo

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

Aryan Shrivastava (@aryan_shri123) 's Twitter Profile Photo

🤫Jailbreak prompts make aligned LMs produce harmful responses.🤔But is that info linearly decodable? ↗️We show many refused concepts are linearly represented, sometimes persist through instruction-tuning, and may also shape downstream behavior❗ arxiv.org/abs/2507.00239 🧵1/

🤫Jailbreak prompts make aligned LMs produce harmful responses.🤔But is that info linearly decodable?

↗️We show many refused concepts are linearly represented, sometimes persist through instruction-tuning, and may also shape downstream behavior❗

arxiv.org/abs/2507.00239
🧵1/