Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile
Michael Aerni @ ICLR

@aernimichael

AI privacy and security | PhD student @CSatETH | Ask me about coffee ☕️

ID: 927860226811981824

linkhttps://michaelaerni.com calendar_today07-11-2017 11:28:11

80 Tweet

165 Followers

157 Following

Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile Photo

🔥 I'm thrilled that I'll be spending next year in the group of Florian Tramèr at ETH Zurich, working on privacy and memorization in ML 🔥 (Not an announcement, just what I usually do. It's a great group full of amazing people, and I'm thrilled to work with them every day!)

Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile Photo

I am in beautiful Vancouver for #NeurIPS2024 with those amazing folks! Say hi if you want to chat about ML privacy and security (or speciality ☕)

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

I've been thinking about Privacy & LLMs work for 2025 - here are 5 research directions and some key papers on privacy/memorization to get started: 🧵

I've been thinking about Privacy & LLMs work for 2025 - here are 5 research directions and some key papers on privacy/memorization to get started:
🧵
Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇
ETH CS Department (@csateth) 's Twitter Profile Photo

🔎Can #AI models be “cured” after a cyber attack? New research from Florian Tramèr's Secure and Private AI Lab reveals that removing poisoned data from AI is harder than we think – harmful info isn’t erased, just hidden. So how do we make AI truly secure?bit.ly/41bJB05

🔎Can #AI models be “cured” after a cyber attack?
New research from <a href="/florian_tramer/">Florian Tramèr</a>'s Secure and Private AI Lab reveals that removing poisoned data from AI is harder than we think – harmful info isn’t erased, just hidden. So how do we make AI truly secure?bit.ly/41bJB05
Edoardo Debenedetti (@edoardo_debe) 's Twitter Profile Photo

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!
Kristina Nikolic @ ICLR '25 (@nkristina01_) 's Twitter Profile Photo

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma!
But did the model actually give a useful answer?

In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile Photo

IMO it's very important to measure LLM utility in tasks that we actually want them to perform well on, not just hard sandbox tasks. This is an excellent benchmark that does exactly that!

Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile Photo

Imagine LLMs could tell you the future. But properly evaluating forecasts is incredibly tricky! This paper contains so many interesting thoughts about all the things that can go wrong.

Kristina Nikolic @ ICLR '25 (@nkristina01_) 's Twitter Profile Photo

We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!