Florian Tramèr (@florian_tramer) Twitter Tweets • TwiCopy

2 weeks ago

Honored that our work with Nicholas Carlini and Florian Tramèr was selected as Distinguished Paper Award Runner-up at SaTML Conference! Thanks to the committee! 🎉

I'll present the paper at the poster session tomorrow and during session E on Thursday. Come chat if you're around!

thumb_up_off_alt47

repeat5

account_circle

Patrick Chao

@patrickrchao

3 weeks ago

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n

account_circle

Javier Rando

1 month ago

Looks like Anthropic and OpenAI tokenize large numbers in opposite ways. Both use single tokens for all numbers [0, 999], but they split large number differently.

Anthropic: 1000 -> [1, 000]
OpenAI: 1000 -> [100, 0]

Anyone has any good intuition for implications? Maybe…

account_circle

Florian Tramèr

@florian_tramer

1 month ago

I just saw these people walking down the street and snapped a quick picture.
This will hopefully bring relief to all of my Twitter feed.

thumb_up_off_alt24

repeat0

account_circle

Javier Rando

1 month ago

New blog post summarising our work on RLHF poisoning. Unlike SFT, RLHF can create universal backdoors in LLMs.

📖 spylab.ai/blog/poisoning…

thumb_up_off_alt29

repeat4

account_circle

Javier Rando

1 month ago

🔓🚨 The Worst (But Only) Claude 3 Tokenizer

We reverse-engineer the tokenizer by analyzing the generation stream. Now you can tokenize arbitrary strings! 🧵👇

github.com/javirandor/ant…

account_circle

Katherine Lee

@katherine1ee

1 month ago

We have a fun attack that lets you extract the last-layer embedding weights of an LM via public APIs. It's really simple & uses SVDs!

not-just-memorization.github.io/partial-model-…

We discovered this for ChatGPT + PaLM-2. We privately disclosed, they fixed, now, we release :)

account_circle

Javier Rando

1 month ago

Turns out that Claude 3 isn't just self-aware; it's got a throat too! What else is Anthropic hiding from us?

Turns out that Claude 3 isn't just self-aware; it's got a throat too! What else is @AnthropicAI hiding from us?

thumb_up_off_alt18

repeat1

account_circle

Edoardo Debenedetti

1 month ago

The SaTML Conference LLM CTF has come to an end!

🥇Huge congrats to the defender winning team Hestia (@NivCohenHuji Yuv Lem) whose top defense was broken only once, and the attacker winning team WreckTheLine (@adragos_ Sijisu (on bsky & mastodon) FeDEX だこつ Catalin Irimie) who broke all defenses!

The @satml_conf LLM CTF has come to an end! 🥇Huge congrats to the defender winning team Hestia (@NivCohenHuji @Yuvlem) whose top defense was broken only once, and the attacker winning team @WreckTheLine (@adragos_ @sijsu @FetchDEX @y011d4 @ca7ir) who broke all defenses!

account_circle

Edoardo Debenedetti

2 months ago

The evaluation phase of our SaTML Conference LLMs CTF started less than 6 hours ago, and 41 out of 44 defenses have been broken by at least one attacking team! 🤯

The live leaderboards for attackers and defenses are live at ctf.spylab.ai/leaderboard!

The evaluation phase of our @satml_conf LLMs CTF started less than 6 hours ago, and 41 out of 44 defenses have been broken by at least one attacking team! 🤯 The live leaderboards for attackers and defenses are live at ctf.spylab.ai/leaderboard!

thumb_up_off_alt34

repeat8

account_circle

Javier Rando

2 months ago

You still have time to participate in our competition at SaTML Conference. Find trojans in LLMs and win prizes!!

thumb_up_off_alt16

repeat2

account_circle

Edoardo Debenedetti

2 months ago

The final phase of our LLMs CTF colocated with SaTML Conference will launch on Sunday AoE!

Joining now is your last chance to be competitive in the attack phase, as the first teams to break each of the *44* defenses will get bonus points!

More info on ctf.spylab.ai

The final phase of our LLMs CTF colocated with @satml_conf will launch on Sunday AoE! Joining now is your last chance to be competitive in the attack phase, as the first teams to break each of the *44* defenses will get bonus points! More info on ctf.spylab.ai

thumb_up_off_alt22

repeat8

account_circle

Javier Rando

3 months ago

🚨 Less than two weeks to submit your defenses for our LLM Capture-the-Flag! 🚨

Can you protect an LLM against prompt injection attacks? There is still time to register a team and get free credits! Nice prizes for the best teams 👀

➡️ ctf.spylab.ai

thumb_up_off_alt22

repeat8