Jie Zhang (@jiezhang_eth) Twitter Tweets • TwiCopy

Jie Zhang

@jiezhang_eth

+ Follow

2-year PhD student at @ETH, AI privacy&security

ID: 1699046373163769856

linkhttps://zj-jayzhang.github.io/ calendar_today05-09-2023 13:06:48

32 Tweet

212 Takipçi

114 Takip Edilen

Jie Zhang

@jiezhang_eth

a year ago

Exciting opportunity! 🎉 Joining SPY Lab has been one of the best decisions I've ever made🙈

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Daniel Paleka

@dpaleka

a year ago

What happened recently in AI/ML safety research (1/8) 🧵:

thumb_up_off_alt143

chat_bubble_outline4

repeat25

shareShare

Michael Aerni @ ICLR

@aernimichael

a year ago

LLMs may be copying training data in everyday conversations with users! In our latest work, we study how often this happens compared to humans. 👇🧵

thumb_up_off_alt132

chat_bubble_outline4

repeat21

shareShare

We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement. But Jie Zhang broke the current version: arxiv.org/abs/2411.14834 Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…

thumb_up_off_alt16

chat_bubble_outline1

repeat3

shareShare

Jie Zhang

@jiezhang_eth

a year ago

We are excited that this work has been accepted by SaTML Conference! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…

thumb_up_off_alt28

chat_bubble_outline1

repeat6

shareShare

Javier Rando @ ICLR

@javirandor

10 months ago

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

thumb_up_off_alt147

chat_bubble_outline4

repeat26

shareShare

Javier Rando @ ICLR

@javirandor

8 months ago

At SpyLab we not only do great research but also have great fun 🏔️

thumb_up_off_alt56

chat_bubble_outline0

repeat4

shareShare

Florian Tramèr

@florian_tramer

8 months ago

I’ll be mentoring MATS for the first time this summer, together with Daniel Paleka! Link below to apply

thumb_up_off_alt67

chat_bubble_outline2

repeat9

shareShare

Kristina Nikolic @ ICLR '25

@nkristina01_

7 months ago

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

thumb_up_off_alt199

chat_bubble_outline6

repeat26

shareShare

Kristina Nikolic @ ICLR '25

@nkristina01_

7 months ago

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. ICLR 2026

thumb_up_off_alt45

chat_bubble_outline0

repeat6

shareShare

Kristina Nikolic @ ICLR '25

@nkristina01_

7 months ago

The Jailbreak Tax got a Spotlight award ICML Conference see you in Vancouver!

thumb_up_off_alt46

chat_bubble_outline0

repeat3

shareShare

Jie Zhang

@jiezhang_eth

7 months ago

It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Xin Cynthia Chen

@xincynthiachen

6 months ago

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

thumb_up_off_alt231

chat_bubble_outline4

repeat41

shareShare

Daniel Paleka

@dpaleka

6 months ago

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

thumb_up_off_alt82

chat_bubble_outline5

repeat12

shareShare