Florian Tramèr(@florian_tramer) 's Twitter Profileg
Florian Tramèr

@florian_tramer

Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning

ID:1179401500478468096

linkhttps://floriantramer.com/ calendar_today02-10-2019 14:23:33

770 Tweets

4,4K Followers

205 Following

Follow People
Edoardo Debenedetti(@edoardo_debe) 's Twitter Profile Photo

Honored that our work with Nicholas Carlini and Florian Tramèr was selected as Distinguished Paper Award Runner-up at SaTML Conference! Thanks to the committee! 🎉

I'll present the paper at the poster session tomorrow and during session E on Thursday. Come chat if you're around!

account_circle
Patrick Chao(@patrickrchao) 's Twitter Profile Photo

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n
account_circle
Javier Rando(@javirandor) 's Twitter Profile Photo

Looks like Anthropic and OpenAI tokenize large numbers in opposite ways. Both use single tokens for all numbers [0, 999], but they split large number differently.

Anthropic: 1000 -> [1, 000]
OpenAI: 1000 -> [100, 0]

Anyone has any good intuition for implications? Maybe…

account_circle
Florian Tramèr(@florian_tramer) 's Twitter Profile Photo

I just saw these people walking down the street and snapped a quick picture.
This will hopefully bring relief to all of my Twitter feed.

I just saw these people walking down the street and snapped a quick picture. This will hopefully bring relief to all of my Twitter feed.
account_circle
Javier Rando(@javirandor) 's Twitter Profile Photo

New blog post summarising our work on RLHF poisoning. Unlike SFT, RLHF can create universal backdoors in LLMs.

📖 spylab.ai/blog/poisoning…

account_circle
Javier Rando(@javirandor) 's Twitter Profile Photo

🔓🚨 The Worst (But Only) Claude 3 Tokenizer

We reverse-engineer the tokenizer by analyzing the generation stream. Now you can tokenize arbitrary strings! 🧵👇

github.com/javirandor/ant…

account_circle
Katherine Lee(@katherine1ee) 's Twitter Profile Photo

We have a fun attack that lets you extract the last-layer embedding weights of an LM via public APIs. It's really simple & uses SVDs!

not-just-memorization.github.io/partial-model-…

We discovered this for ChatGPT + PaLM-2. We privately disclosed, they fixed, now, we release :)

account_circle
Edoardo Debenedetti(@edoardo_debe) 's Twitter Profile Photo

The SaTML Conference LLM CTF has come to an end!

🥇Huge congrats to the defender winning team Hestia (@NivCohenHuji Yuv Lem) whose top defense was broken only once, and the attacker winning team WreckTheLine (@adragos_ Sijisu (on bsky & mastodon) FeDEX だこつ Catalin Irimie) who broke all defenses!

The @satml_conf LLM CTF has come to an end! 🥇Huge congrats to the defender winning team Hestia (@NivCohenHuji @Yuvlem) whose top defense was broken only once, and the attacker winning team @WreckTheLine (@adragos_ @sijsu @FetchDEX @y011d4 @ca7ir) who broke all defenses!
account_circle
Edoardo Debenedetti(@edoardo_debe) 's Twitter Profile Photo

The evaluation phase of our SaTML Conference LLMs CTF started less than 6 hours ago, and 41 out of 44 defenses have been broken by at least one attacking team! 🤯

The live leaderboards for attackers and defenses are live at ctf.spylab.ai/leaderboard!

The evaluation phase of our @satml_conf LLMs CTF started less than 6 hours ago, and 41 out of 44 defenses have been broken by at least one attacking team! 🤯 The live leaderboards for attackers and defenses are live at ctf.spylab.ai/leaderboard!
account_circle
Edoardo Debenedetti(@edoardo_debe) 's Twitter Profile Photo

The final phase of our LLMs CTF colocated with SaTML Conference will launch on Sunday AoE!

Joining now is your last chance to be competitive in the attack phase, as the first teams to break each of the *44* defenses will get bonus points!

More info on ctf.spylab.ai

The final phase of our LLMs CTF colocated with @satml_conf will launch on Sunday AoE! Joining now is your last chance to be competitive in the attack phase, as the first teams to break each of the *44* defenses will get bonus points! More info on ctf.spylab.ai
account_circle
Javier Rando(@javirandor) 's Twitter Profile Photo

🚨 Less than two weeks to submit your defenses for our LLM Capture-the-Flag! 🚨

Can you protect an LLM against prompt injection attacks? There is still time to register a team and get free credits! Nice prizes for the best teams 👀

➡️ ctf.spylab.ai

🚨 Less than two weeks to submit your defenses for our LLM Capture-the-Flag! 🚨 Can you protect an LLM against prompt injection attacks? There is still time to register a team and get free credits! Nice prizes for the best teams 👀 ➡️ ctf.spylab.ai
account_circle