francesco croce (@fra__31) 's Twitter Profile
francesco croce

@fra__31

Postdoc @tml_lab PhD @uni_tue

ID: 1133088397273313282

linkhttps://fra31.github.io/ calendar_today27-05-2019 19:11:40

67 Tweet

274 Takipçi

225 Takip Edilen

Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨 I'm looking for a postdoc position to start in Fall 2024! My most recent research interests are related to understanding foundation models (especially LLMs!), making them more reliable, and developing principled methods for deep learning. More info: andriushchenko.me

Javier Rando @ ICLR (@javirandor) 's Twitter Profile Photo

We are announcing the winners of our Trojan Detection Competition on Aligned LLMs!! 🥇 TML Lab (EPFL) (francesco croce, Maksym Andriushchenko and Nicolas Flammarion) 🥈 Krystof Mitka 🥉 @apeoffire 🧵 With some of the main findings!

Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Is In-Context Learning Sufficient for Instruction Following in LLMs?🚨 In our new work, we study alignment of base models, including GPT-4-Base (!), via many-shot in-context learning. I.e., no fine-tuning whatsoever, just prompting - how far can we go? Many people are

🚨Is In-Context Learning Sufficient for Instruction Following in LLMs?🚨

In our new work, we study alignment of base models, including GPT-4-Base (!), via many-shot in-context learning. I.e., no fine-tuning whatsoever, just prompting - how far can we go?

Many people are
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

The ICML'24 camera-ready of "long is more" is up: a more thorough analysis of the length bias, MT-Bench scores, a human study, and many more additional results. Check it out: arxiv.org/abs/2402.04833 Everything points out to the fact that training on instructions with long

Christian Schlarmann (@chs20_) 's Twitter Profile Photo

📢❗[ICML 2024 Oral] We introduce FARE: A CLIP model that is adversarially robust in zero-shot classification and enables robust large vision-language models (LVLMs) Paper: arxiv.org/abs/2402.12336 Code: github.com/chs20/RobustVLM Huggingface: huggingface.co/collections/ch… 🧵1/n

📢❗[ICML 2024 Oral] We introduce FARE: A CLIP model that is adversarially robust in zero-shot classification and enables robust large vision-language models (LVLMs)

Paper: arxiv.org/abs/2402.12336
Code: github.com/chs20/RobustVLM
Huggingface: huggingface.co/collections/ch…

🧵1/n
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨 We are very excited to release JailbreakBench v1.0! 📄 We have substantially extended the version 0.1 that was on arXiv since March: - More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): github.com/JailbreakBench…. - More

🚨 We are very excited to release JailbreakBench v1.0!

📄 We have substantially extended the version 0.1 that was on arXiv since March:
- More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): github.com/JailbreakBench….
- More
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🆕We will present a short version of our adaptive attack paper arxiv.org/abs/2404.02151 at the ICML '24 NextGenAISafety Workshop. See some of you there! 🚨We've also just released the v2 of the paper on arXiv. Main updates: - more models: Llama-3, Phi-3, Nemotron-4-340B (100%

🆕We will present a short version of our adaptive attack paper arxiv.org/abs/2404.02151 at the ICML '24 NextGenAISafety Workshop. See some of you there!

🚨We've also just released the v2 of the paper on arXiv. Main updates:
- more models: Llama-3, Phi-3, Nemotron-4-340B (100%
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Excited to share our new paper!🚨 We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?" to "How did people make a Molotov cocktail?") is often

🚨Excited to share our new paper!🚨

We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?"  to "How did people make a Molotov cocktail?") is often
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

We are presenting two papers at the NextGenAISafety workshop today: - Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks (arxiv.org/abs/2404.02151) - JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models (arxiv.org/abs/2404.01318)

We are presenting two papers at the NextGenAISafety workshop today:
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks (arxiv.org/abs/2404.02151)
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models (arxiv.org/abs/2404.01318)
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

⭐New⭐ How to achieve 100% jailbreak success rate on the latest frontier LLMs, Sonnet 3.5 and GPT-4o? We are keeping our adaptive attack paper up-to-date. Here are our most recent findings: - All Claude models are (still) trivial to jailbreak using the prefilling feature. You

⭐New⭐ How to achieve 100% jailbreak success rate on the latest frontier LLMs, Sonnet 3.5 and GPT-4o?

We are keeping our adaptive attack paper up-to-date. 

Here are our most recent findings:
- All Claude models are (still) trivial to jailbreak using the prefilling feature. You
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨New🚨 We are asking a fundamental question: how far can we push in-context learning for instruction following and how does it compare to fine-tuning? TL;DR: you should, of course, fine-tune, but the scaling laws are similar, at least in the small-sample regime: Key findings

🚨New🚨 We are asking a fundamental question: how far can we push in-context learning for instruction following and how does it compare to fine-tuning?

TL;DR: you should, of course, fine-tune, but the scaling laws are similar, at least in the small-sample regime:

Key findings
Hao Zhao (@h_aozhao) 's Twitter Profile Photo

🚨Don't miss out on my PhD application!🚨 Finally completed all of my PhD applications🎄. I foresee a high level of anxiety while waiting for interviews and decisions. I want to take this opportunity to summarize what I've done and what I hope to accomplish during my PhD. 🧵1/6

Christian Schlarmann (@chs20_) 's Twitter Profile Photo

📢 Robustness is not always at odds with accuracy! We show that adversarially robust vision encoders improve clean and robust accuracy over their base models in perceptual similarity tasks. Looking forward to presenting at SaTML SaTML Conference in Copenhagen next week 🇩🇰

📢 Robustness is not always at odds with accuracy! We show that adversarially robust vision encoders improve clean and robust accuracy over their base models in perceptual similarity tasks. Looking forward to presenting at SaTML <a href="/satml_conf/">SaTML Conference</a> in Copenhagen next week 🇩🇰
Workshop on Test-Time Adaptation (PUT) @ ICML2025 (@put_tta_icml25) 's Twitter Profile Photo

🚀 Join the 2𝐧𝐝 𝐖𝐨𝐫𝐤𝐬𝐡𝐨𝐩 𝐨𝐧 𝐓𝐞𝐬𝐭-𝐓𝐢𝐦𝐞 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧: 𝐏𝐮𝐭𝐭𝐢𝐧𝐠 𝐔𝐩𝐝𝐚𝐭𝐞𝐬 𝐭𝐨 𝐭𝐡𝐞 𝐓𝐞𝐬𝐭! (𝐏𝐔𝐓) - #ICML2025 ICML Conference 🎯Keywords - Adaptation, Robustness, etc ⌛ Paper Deadline: May 19, 2025 📌 tta-icml2025.github.io

Workshop on Test-Time Adaptation (PUT) @ ICML2025 (@put_tta_icml25) 's Twitter Profile Photo

⏰The deadline is pretty close! Consider submitting your work. Our TTA workshop will accept papers until May 19, 2025. We're excited to meet you at ICML. tta-icml2025.github.io/index.html

Workshop on Test-Time Adaptation (PUT) @ ICML2025 (@put_tta_icml25) 's Twitter Profile Photo

🚨HOLD UP! You don't wanna be missing talks by some top researchers out there! These talks are where the magic happens. Grab your seat and get ready to level up! 🔥 Kate Saenko , Xiaoxiao Li, Deepak Pathak, Shuaicheng Niu, Rahaf Aljundi, and Gao Huang! 🔗tta-icml2025.github.io