Javier Rando @ ICLR (@javirandor) 's Twitter Profile
Javier Rando @ ICLR

@javirandor

Red-Teaming LLMs / PhD student @ETH_AI_Center / Prev. research intern @Meta and @nyuniversity / People call me Javi / Vegan 🌱

ID: 1052831027662589952

linkhttps://javirando.com calendar_today18-10-2018 07:57:32

1,1K Tweet

2,2K Followers

720 Following

Asa Cooper Stickland (@asacoopstick) 's Twitter Profile Photo

New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help. Our results suggest that models are rapidly improving, and the best frontier models are

New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help.

Our results suggest that models are rapidly improving, and the best frontier models are
Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Our paper was accepted at TMLR. We show how unlearning fails to remove knowledge using finetuning (on safe info), GCG, activation interventions and much more. We need better open-source safeguards!

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Don’t be sad ICLR is ending and come check our poster at #301. We will convince you pre-training poisoning is an important threat 😈

Don’t be sad ICLR is ending and come check our poster at #301. We will convince you pre-training poisoning is an important threat 😈
Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Tomorrow I will be in Madrid for an amazing event at UC3M, where I will present some of my views on what challenges lie ahead in AI Security. First time presenting in Spain, very excited! eventos.uc3m.es/131114/program…

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

Career update! I will soon be joining the Safeguards team at Anthropic to work on some of the problems I believe are among the most important for the years ahead.

Jie Zhang (@jiezhang_eth) 's Twitter Profile Photo

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).
Florian Tramèr (@florian_tramer) 's Twitter Profile Photo

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

Florian Tramèr (@florian_tramer) 's Twitter Profile Photo

Following on Andrej Karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

Following on <a href="/karpathy/">Andrej Karpathy</a>'s vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs.

In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?
Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

We (w Zachary Novack Jaechul Roh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️