Lucy Li (@lucy3_li) 's Twitter Profile
Lucy Li

@lucy3_li

PhD student @berkeley_ai, @BerkeleyISchool. Prev @allen_ai, @MSFTResearch, and @stanfordnlp. More talkative on lucy3.bsky.social

ID: 861417356756533248

linkhttp://lucy3.github.io calendar_today08-05-2017 03:07:57

3,3K Tweet

4,4K Followers

1,1K Following

Nishant Balepur (@nishantbalepur) 's Twitter Profile Photo

🚨 New Position Paper 🚨 Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬 We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠 Here's why MCQA evals are broken, and how to fix them 🧵

🚨 New Position Paper 🚨

Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠

Here's why MCQA evals are broken, and how to fix them 🧵
Serina Chang (@serinachang5) 's Twitter Profile Photo

Excited to have two papers accepted to ACL 2025 main! 🎉 1. ChatBench with jake hofman Ashton Anderson - we conduct a large-scale user study converting static benchmark questions into human-AI conversations, showing how benchmarks fail to predict human-AI outcomes.

Excited to have two papers accepted to ACL 2025 main! 🎉 

1. ChatBench with <a href="/jakehofman/">jake hofman</a> <a href="/ashton1anderson/">Ashton Anderson</a> - we conduct a large-scale user study converting static benchmark questions into human-AI conversations, showing how benchmarks fail to predict human-AI outcomes.
Hanna Wallach (@hannawallach.bsky.social) (@hannawallach) 's Twitter Profile Photo

Exciting news: the Fairness, Accountability, Transparency and Ethics (FATE) group at Microsoft Research NYC is hiring a predoctoral fellow!!! 🎉 microsoft.com/en-us/research…

Andrew Piper (@_akpiper) 's Twitter Profile Photo

Do you love children's books! Well then come over to our new Citizen Science project: Picturing Children's Stories. Help us annotate tens of thousands of book illustrations to understand the history of childhood and visual storytelling.

Do you love children's books! Well then come over to our new Citizen Science project: Picturing Children's Stories. Help us annotate tens of thousands of book illustrations to understand the history of childhood and visual storytelling.
Yapei Chang (@yapeichang) 's Twitter Profile Photo

🤔 Can simple string-matching metrics like BLEU rival reward models for LLM alignment? 🔍 We show that given access to a reference, BLEU can match reward models in human preference agreement, and even train LLMs competitively with them using GRPO. 🫐 Introducing BLEUBERI:

Myra Cheng (@chengmyra1) 's Twitter Profile Photo

Dear ChatGPT, Am I the Asshole? While Reddit users might say yes, your favorite LLM probably won’t. We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.

Dear ChatGPT, Am I the Asshole?
While Reddit users might say yes, your favorite LLM probably won’t.
We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
Michael Black (@michael_j_black) 's Twitter Profile Photo

If you're an international PhD student at Harvard studying computer vision and your visa is cancelled, reach out to me or others in Europe. Don't despair. I'm sure we can find you a great place to carry on your research.

Kiran Garimella (@gvrkiran) 's Twitter Profile Photo

this paper quantiatively shows what someone told me: "everyone loves interdisciplinary but no one will give u a job if you are interdisciplinary" Hiring at top universities rewards disciplinary loyalty over interdisciplinary breadth. Things are changing. arxiv.org/abs/2503.21912

this paper quantiatively shows what someone told me: "everyone loves interdisciplinary but no one will give u a job if you are interdisciplinary"

Hiring at top universities rewards disciplinary loyalty over interdisciplinary breadth. Things are changing.

arxiv.org/abs/2503.21912
Divya Siddarth (@divyasiddarth) 's Twitter Profile Photo

As we do societal evals at CIP —public health, AI relationships, democracy, etc. across regional languages we've spent a lot of time dealing with how brittle LLM judge pipelines are. Stoked to share an open-source test suite (blog + code) we’ve built to stress-test ours before

As we do societal evals at CIP —public health, AI relationships, democracy, etc. across regional languages we've spent a lot of time dealing with how brittle LLM judge pipelines are. 

Stoked to share an open-source test suite (blog + code) we’ve built to stress-test ours before
Kayo Yin (@kayo_yin) 's Twitter Profile Photo

Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🧠🎉 How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach? 🌐 sites.google.com/berkeley.edu/p… 📅 Submit by June 23rd

Lucy Li (@lucy3_li) 's Twitter Profile Photo

"Tell, Don't Show" was accepted to #ACL2025 Findings! Our simple approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. A possible addition to your CSS/DH research 🛠️ box ✨📚 arxiv.org/abs/2505.23166

"Tell, Don't Show" was accepted to #ACL2025 Findings! 

Our simple approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. A possible addition to your CSS/DH research 🛠️ box

✨📚 arxiv.org/abs/2505.23166
zhou Yu (@zhou_yu_ai) 's Twitter Profile Photo

I wrote this blog post to share practical tips on how academics can collaborate with industry to explore alternative funding sources. Amid all the government cuts, I hope this can help other faculties. Feel free to reach out to me if you need more help. The universe conspires

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

🤝 Humans + AI = Better together? Our #ACL2025 tutorial offers an interdisciplinary overview of human-AI collaboration to explore its goals, evaluation, and societal impacts 🤖

Angelina Wang @angelinawang.bsky.social (@ang3linawang) 's Twitter Profile Photo

Have you ever felt that AI fairness was too strict, enforcing fairness when it didn’t seem necessary? How about too narrow, missing a wide range of important harms? We argue that the way to address both of these critiques is to discriminate more 🧵

Shaily (@shaily99) 's Twitter Profile Photo

🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11

🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with researchers to identify &amp; measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784 

1/11
Morgan Klaus Scheuerman, PhD (he/him) (@morganklauss) 's Twitter Profile Photo

How can ethical principles translate to the massive data used to train foundation models, like generative AI? Our #CSCW2025 workshop aims to explore how best to define the future of ethical responsibility in largescale datasets for FM training. Apply here: tinyurl.com/CSCW-data

Percy Liang (@percyliang) 's Twitter Profile Photo

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything: