Jie Zhang (@jiezhang_eth) 's Twitter Profile
Jie Zhang

@jiezhang_eth

2-year PhD student at @ETH, AI privacy&security

ID: 1699046373163769856

linkhttps://zj-jayzhang.github.io/ calendar_today05-09-2023 13:06:48

32 Tweet

212 Takipçi

114 Takip Edilen

Michael Aerni @ ICLR (@aernimichael) 's Twitter Profile Photo

LLMs may be copying training data in everyday conversations with users! In our latest work, we study how often this happens compared to humans. 👇🧵

LLMs may be copying training data in everyday conversations with users!

In our latest work, we study how often this happens compared to humans. 👇🧵
Florian Tramèr (@florian_tramer) 's Twitter Profile Photo

We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement. But Jie Zhang broke the current version: arxiv.org/abs/2411.14834 Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…

Jie Zhang (@jiezhang_eth) 's Twitter Profile Photo

We are excited that this work has been accepted by SaTML Conference! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇
Kristina Nikolic @ ICLR '25 (@nkristina01_) 's Twitter Profile Photo

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma!
But did the model actually give a useful answer?

In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
Kristina Nikolic @ ICLR '25 (@nkristina01_) 's Twitter Profile Photo

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. ICLR 2026

Jie Zhang (@jiezhang_eth) 's Twitter Profile Photo

It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉

Xin Cynthia Chen (@xincynthiachen) 's Twitter Profile Photo

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models

We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights.
🧵
Daniel Paleka (@dpaleka) 's Twitter Profile Photo

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.

We identify key issues with forecasting evaluations 🧵 (1/7)