Javier Rando @ ICLR (@javirandor) Twitter Tweets • TwiCopy

Javier Rando @ ICLR

@javirandor

+ Follow

Red-Teaming LLMs / PhD student @ETH_AI_Center / Prev. research intern @Meta and @nyuniversity / People call me Javi / Vegan 🌱

ID: 1052831027662589952

linkhttps://javirando.com calendar_today18-10-2018 07:57:32

1,1K Tweet

2,2K Takipçi

720 Takip Edilen

Javier Rando @ ICLR

@javirandor

7 months ago

I will be in Singapore next week. Looking forward to meeting people, DM if interested!

thumb_up_off_alt35

chat_bubble_outline0

repeat0

shareShare

New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help. Our results suggest that models are rapidly improving, and the best frontier models are

thumb_up_off_alt199

chat_bubble_outline5

repeat42

shareShare

Javier Rando @ ICLR

@javirandor

7 months ago

Shipment arrived on time. All non-sleepy members of SPY Lab are now in Singapore. Come meet us! Edoardo Debenedetti Daniel Paleka Michael Aerni Jie Zhang Kristina Nikolic

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare

Javier Rando @ ICLR

@javirandor

7 months ago

Our paper was accepted at TMLR. We show how unlearning fails to remove knowledge using finetuning (on safe info), GCG, activation interventions and much more. We need better open-source safeguards!

thumb_up_off_alt60

chat_bubble_outline1

repeat11

shareShare

Javier Rando @ ICLR

@javirandor

7 months ago

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

thumb_up_off_alt24

chat_bubble_outline0

repeat3

shareShare

Javier Rando @ ICLR

@javirandor

7 months ago

We are live at # 324!

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Javier Rando @ ICLR

@javirandor

7 months ago

Don’t be sad ICLR is ending and come check our poster at #301. We will convince you pre-training poisoning is an important threat 😈

thumb_up_off_alt31

chat_bubble_outline2

repeat1

shareShare

Javier Rando @ ICLR

@javirandor

6 months ago

Tomorrow I will be in Madrid for an amazing event at UC3M, where I will present some of my views on what challenges lie ahead in AI Security. First time presenting in Spain, very excited! eventos.uc3m.es/131114/program…

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Javier Rando @ ICLR

@javirandor

6 months ago

Very excited to be here today!

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Javier Rando @ ICLR

@javirandor

6 months ago

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

thumb_up_off_alt35

chat_bubble_outline0

repeat3

shareShare

Javier Rando @ ICLR

@javirandor

6 months ago

Career update! I will soon be joining the Safeguards team at Anthropic to work on some of the problems I believe are among the most important for the years ahead.

thumb_up_off_alt511

chat_bubble_outline42

repeat15

shareShare

Jie Zhang

@jiezhang_eth

6 months ago

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

thumb_up_off_alt130

chat_bubble_outline5

repeat20

shareShare

Florian Tramèr

@florian_tramer

6 months ago

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

thumb_up_off_alt24

chat_bubble_outline1

repeat4

shareShare

Florian Tramèr

@florian_tramer

6 months ago

Following on Andrej Karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

Following on <a href="/karpathy/">Andrej Karpathy</a>'s vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs.

In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

thumb_up_off_alt110

chat_bubble_outline2

repeat20

shareShare

Javier Rando @ ICLR

@javirandor

6 months ago

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

6 months ago

We (w Zachary Novack Jaechul Roh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

thumb_up_off_alt34

chat_bubble_outline2

repeat7

shareShare

mrinank 🍂

@mrinanksharma

6 months ago

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

thumb_up_off_alt129

chat_bubble_outline7

repeat16

shareShare

Javier Rando @ ICLR

Javier Rando @ ICLR

Asa Cooper Stickland

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Javier Rando @ ICLR

Jie Zhang

Florian Tramèr

Florian Tramèr

Javier Rando @ ICLR

Niloofar (on faculty job market!)

mrinank 🍂