TML Lab (EPFL) (@tml_lab) 's Twitter Profile
TML Lab (EPFL)

@tml_lab

Theory of Machine Learning Lab at @EPFL led by Nicolas Flammarion. We develop algorithmic & theoretical tools to better understand ML & make it more robust.

ID: 1463906180531736576

linkhttps://www.epfl.ch/labs/tml/ calendar_today25-11-2021 16:25:28

27 Tweet

363 Followers

92 Following

Mathieu Even (@mathieu_even1) 's Twitter Profile Photo

Hi! I am not in Neurips at NO, but very happy to share our poster with you! @pesme_scott either, but if you are interested please talk to Suriya Gunasekar or Nicolas (TML Lab (EPFL)) who are present !

Hi! I am not in Neurips at NO, but very happy to share our poster with you! @pesme_scott either, but if you are interested please talk to <a href="/suriyagnskr/">Suriya Gunasekar</a> or Nicolas (<a href="/tml_lab/">TML Lab (EPFL)</a>) who are present !
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

We all know that AGI is coming, BUT adversarial examples are *still* not solved and scale is not all you need! Simple random search using logprobs of GPT-4 reveals that it has quite limited robustness. Short paper: andriushchenko.me/gpt4adv.pdf Code: github.com/max-andr/adver… 🧵1/n

We all know that AGI is coming, BUT adversarial examples are *still* not solved and scale is not all you need! Simple random search using logprobs of GPT-4 reveals that it has quite limited robustness.

Short paper: andriushchenko.me/gpt4adv.pdf
Code: github.com/max-andr/adver…

🧵1/n
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

So, what really matters for instruction fine-tuning? Surprisingly, simply fine-tuning on the *longest* examples is an extremely strong baseline for alignment of LLMs. Really excited to share our new work: arxiv.org/abs/2402.04833. Full story below! 🧵1/n

So, what really matters for instruction fine-tuning?

Surprisingly, simply fine-tuning on the *longest* examples is an extremely strong baseline for alignment of LLMs.

Really excited to share our new work: arxiv.org/abs/2402.04833. Full story below!

🧵1/n
Etienne Boursier (@eboursie) 's Twitter Profile Photo

Training dynamics of ReLU networks is back! Many works point out to a mysterious early alignment phase. While this phase has obvious perks for implicit bias, it can also lead to harder optimization and even convergence towards spurious stationary points. Let me explain 🧵

Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

Very excited about this: our team led by francesco croce won the SatML trojan detection competition (method: simple random search + heuristic to reduce the search space) Interestingly, the final score (-33.4) is very close to the score on the real trojans (-37.7) RLHFed into the LLMs!

Patrick Chao (@patrickrchao) 's Twitter Profile Photo

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent?

Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs!

jailbreakbench.github.io
🧵1/n
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

Llama-3 is absolutely impressive, but is it more resilient to adaptive jailbreak attacks compared to Llama-2? 🤔 Not much. The same approach as in our recent work arxiv.org/abs/2404.02151 leads to 100% attack success rate. The code and logs of the attack are now available:

Llama-3 is absolutely impressive, but is it more resilient to adaptive jailbreak attacks compared to Llama-2? 🤔

Not much. The same approach as in our recent work arxiv.org/abs/2404.02151 leads to 100% attack success rate.

The code and logs of the attack are now available:
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

Super excited to share that I successfully defended my PhD thesis "Understanding Generalization and Robustness in Modern Deep Learning" today 👨‍🎓 A huge thanks to the thesis examiners Sebastien Bubeck, Zico Kolter, and Krzakala Florent, jury president Rachid Guerraoui, and, of course,

Super excited to share that I successfully defended my PhD thesis "Understanding Generalization and Robustness in Modern Deep Learning" today 👨‍🎓

A huge thanks to the thesis examiners <a href="/SebastienBubeck/">Sebastien Bubeck</a>, <a href="/zicokolter/">Zico Kolter</a>, and <a href="/KrzakalaF/">Krzakala Florent</a>, jury president Rachid Guerraoui, and, of course,
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🆕We will present a short version of our adaptive attack paper arxiv.org/abs/2404.02151 at the ICML '24 NextGenAISafety Workshop. See some of you there! 🚨We've also just released the v2 of the paper on arXiv. Main updates: - more models: Llama-3, Phi-3, Nemotron-4-340B (100%

🆕We will present a short version of our adaptive attack paper arxiv.org/abs/2404.02151 at the ICML '24 NextGenAISafety Workshop. See some of you there!

🚨We've also just released the v2 of the paper on arXiv. Main updates:
- more models: Llama-3, Phi-3, Nemotron-4-340B (100%
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Excited to share our new paper!🚨 We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?" to "How did people make a Molotov cocktail?") is often

🚨Excited to share our new paper!🚨

We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?"  to "How did people make a Molotov cocktail?") is often
EPFL Research Office (@epfl_reo) 's Twitter Profile Photo

📢 The EPFL_AI_Center Postdoctoral Fellowships call is now open! 💡Are you a postdoctoral researcher interested in collaborative and interdisciplinary research on #AI topics? ✏️Apply now until 29 November 2024 (17:00 CET). 👉More info: epfl.ch/research/fundi…

Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨 So, why do we need weight decay in modern deep learning? 🚨 The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version). Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining

🚨 So, why do we need weight decay in modern deep learning? 🚨

The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version).

Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining
Marcel Salathé (@marcelsalathe) 's Twitter Profile Photo

Mindblowing: EPFL PhD student Maksym Andriushchenko, winner of best CS thesis award, showed that leading #AI models are not robust to even simple adaptive jailbreaking attacks. Indeed, he managed to jailbraik all models with a 100% success rate 🤯 Tonight, after winning the Patrick

Mindblowing: EPFL PhD student <a href="/maksym_andr/">Maksym Andriushchenko</a>, winner of best CS thesis award, showed that leading #AI models are not robust to even simple adaptive jailbreaking attacks. Indeed, he managed to jailbraik all models with a 100% success rate 🤯

Tonight, after winning the Patrick
Hao Zhao (@h_aozhao) 's Twitter Profile Photo

🚨Don't miss out on my PhD application!🚨 Finally completed all of my PhD applications🎄. I foresee a high level of anxiety while waiting for interviews and decisions. I want to take this opportunity to summarize what I've done and what I hope to accomplish during my PhD. 🧵1/6

EPFL (@epfl_en) 's Twitter Profile Photo

🔍 New research from our school demonstrates that even the most recent Large Language Models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways. go.epfl.ch/GPk-en

francesco croce (@fra__31) 's Twitter Profile Photo

📃 In our new paper, we introduce FuseLIP, an encoder for multimodal embedding. We use early fusion of modalities to train a single transformer on contrastive + masked (multimodal) modeling loss More details👇

Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.

🚨Excited to release OS-Harm! 🚨

The safety of computer use agents has been largely overlooked. 

We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
francesco croce (@fra__31) 's Twitter Profile Photo

Happy to share that I've started as an assistant professor at Aalto University and ELLIS Institute Finland! I'll recruit students via the ELLIS PhD Program ellis.eu/research/phd-p… to work on multimodal learning, robustness, visual reasoning... feel free to reach out!

Happy to share that I've started as an assistant professor at <a href="/AaltoUniversity/">Aalto University</a> and ELLIS Institute Finland!

I'll recruit students via the ELLIS PhD Program ellis.eu/research/phd-p… to work on multimodal learning, robustness, visual reasoning... feel free to reach out!