Joachim Baumann (@joabaum) 's Twitter Profile
Joachim Baumann

@joabaum

PhD student @UZH_en visiting @MilaNLProc | algorithmic fairness | NLP

ID: 1360270456985690113

linkhttps://www.ifi.uzh.ch/en/scg/people/Baumann.html calendar_today12-02-2021 17:08:15

0 Tweet

70 Followers

675 Following

MilaNLP (@milanlproc) 's Twitter Profile Photo

🎉 The MilaNLP lab is excited to present 15 papers and 1 tutorial at #ACL2025 & workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀

🎉 The <a href="/MilaNLProc/">MilaNLP</a> lab is excited to present 15 papers and 1 tutorial at #ACL2025 &amp; workshops! Grateful to all our amazing collaborators, see everyone in Vienna! 🚀
fly51fly (@fly51fly) 's Twitter Profile Photo

[CL] Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation J Baumann, P Röttger, A Urman, A Wendsjö... [Bocconi University & University of Zurich] (2025) arxiv.org/abs/2509.08825

[CL] Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation
J Baumann, P Röttger, A Urman, A Wendsjö... [Bocconi University &amp; University of Zurich] (2025)
arxiv.org/abs/2509.08825
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

📣New paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9

📣New paper: Rigorous AI agent evaluation is much harder than it seems.

For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. 

Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9
Joachim Baumann (@joabaum) 's Twitter Profile Photo

Cool paper by Eddie Yang, confirming our LLM hacking findings (arxiv.org/pdf/2509.08825): âś“ LLMs are brittle data annotators âś“ Downstream conclusions often flip: *LLM hacking risk* is real! âś“ Bias correction methods can help but have tradeoffs âś“ Use human expert whenever possible

Manoel (@manoelribeiro) 's Twitter Profile Photo

The debate over “LLMs as annotators” feels familiar: excitement, backlash, and anxiety about bad science. My take in a new blogpost is that LLMs don’t break measurement; they expose how fragile it already was. doomscrollingbabel.manoel.xyz/p/labeling-dat…

The debate over “LLMs as annotators” feels familiar: excitement, backlash, and anxiety about bad science. My take in a new blogpost is that LLMs don’t break measurement; they expose how fragile it already was.

doomscrollingbabel.manoel.xyz/p/labeling-dat…