TML Lab (EPFL) (@tml_lab) Twitter Tweets • TwiCopy

Mathieu Even

2 years ago

Hi! I am not in Neurips at NO, but very happy to share our poster with you! @pesme_scott either, but if you are interested please talk to Suriya Gunasekar or Nicolas (TML Lab (EPFL)) who are present !

Hi! I am not in Neurips at NO, but very happy to share our poster with you! @pesme_scott either, but if you are interested please talk to <a href="/suriyagnskr/">Suriya Gunasekar</a> or Nicolas (<a href="/tml_lab/">TML Lab (EPFL)</a>) who are present !

thumb_up_off_alt24

chat_bubble_outline1

repeat2

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 years ago

We all know that AGI is coming, BUT adversarial examples are *still* not solved and scale is not all you need! Simple random search using logprobs of GPT-4 reveals that it has quite limited robustness. Short paper: andriushchenko.me/gpt4adv.pdf Code: github.com/max-andr/adver… 🧵1/n

thumb_up_off_alt453

chat_bubble_outline11

repeat73

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 years ago

So, what really matters for instruction fine-tuning? Surprisingly, simply fine-tuning on the *longest* examples is an extremely strong baseline for alignment of LLMs. Really excited to share our new work: arxiv.org/abs/2402.04833. Full story below! 🧵1/n

thumb_up_off_alt147

chat_bubble_outline5

repeat28

shareShare

Etienne Boursier

@eboursie

2 years ago

Training dynamics of ReLU networks is back! Many works point out to a mysterious early alignment phase. While this phase has obvious perks for implicit bias, it can also lead to harder optimization and even convergence towards spurious stationary points. Let me explain 🧵

thumb_up_off_alt34

chat_bubble_outline3

repeat5

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 years ago

Very excited about this: our team led by francesco croce won the SatML trojan detection competition (method: simple random search + heuristic to reduce the search space) Interestingly, the final score (-33.4) is very close to the score on the real trojans (-37.7) RLHFed into the LLMs!

thumb_up_off_alt42

chat_bubble_outline2

repeat2

shareShare

Patrick Chao

@patrickrchao

2 years ago

Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n

thumb_up_off_alt173

chat_bubble_outline2

repeat44

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 years ago

Llama-3 is absolutely impressive, but is it more resilient to adaptive jailbreak attacks compared to Llama-2? 🤔 Not much. The same approach as in our recent work arxiv.org/abs/2404.02151 leads to 100% attack success rate. The code and logs of the attack are now available:

thumb_up_off_alt138

chat_bubble_outline5

repeat20

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 years ago

Super excited to share that I successfully defended my PhD thesis "Understanding Generalization and Robustness in Modern Deep Learning" today 👨‍🎓 A huge thanks to the thesis examiners Sebastien Bubeck, Zico Kolter, and Krzakala Florent, jury president Rachid Guerraoui, and, of course,

thumb_up_off_alt417

chat_bubble_outline60

repeat12

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

🆕We will present a short version of our adaptive attack paper arxiv.org/abs/2404.02151 at the ICML '24 NextGenAISafety Workshop. See some of you there! 🚨We've also just released the v2 of the paper on arXiv. Main updates: - more models: Llama-3, Phi-3, Nemotron-4-340B (100%

thumb_up_off_alt37

chat_bubble_outline1

repeat5

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

🚨Excited to share our new paper!🚨 We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?" to "How did people make a Molotov cocktail?") is often

thumb_up_off_alt481

chat_bubble_outline22

repeat95

shareShare

EPFL Research Office

@epfl_reo

a year ago

📢 The EPFL_AI_Center Postdoctoral Fellowships call is now open! 💡Are you a postdoctoral researcher interested in collaborative and interdisciplinary research on #AI topics? ✏️Apply now until 29 November 2024 (17:00 CET). 👉More info: epfl.ch/research/fundi…

thumb_up_off_alt14

chat_bubble_outline0

repeat6

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

a year ago

🚨 So, why do we need weight decay in modern deep learning? 🚨 The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version). Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining

thumb_up_off_alt699

chat_bubble_outline11

repeat111

shareShare

Marcel Salathé

@marcelsalathe

a year ago

Mindblowing: EPFL PhD student Maksym Andriushchenko, winner of best CS thesis award, showed that leading #AI models are not robust to even simple adaptive jailbreaking attacks. Indeed, he managed to jailbraik all models with a 100% success rate 🤯 Tonight, after winning the Patrick

Mindblowing: EPFL PhD student <a href="/maksym_andr/">Maksym Andriushchenko</a>, winner of best CS thesis award, showed that leading #AI models are not robust to even simple adaptive jailbreaking attacks. Indeed, he managed to jailbraik all models with a 100% success rate 🤯

Tonight, after winning the Patrick

thumb_up_off_alt95

chat_bubble_outline1

repeat11

shareShare

Hao Zhao

@h_aozhao

a year ago

🚨Don't miss out on my PhD application!🚨 Finally completed all of my PhD applications🎄. I foresee a high level of anxiety while waiting for interviews and decisions. I want to take this opportunity to summarize what I've done and what I hope to accomplish during my PhD. 🧵1/6

thumb_up_off_alt135

chat_bubble_outline6

repeat17

shareShare

EPFL

@epfl_en

a year ago

🔍 New research from our school demonstrates that even the most recent Large Language Models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways. go.epfl.ch/GPk-en

thumb_up_off_alt34

chat_bubble_outline1

repeat11

shareShare

francesco croce

@fra__31

5 months ago

📃 In our new paper, we introduce FuseLIP, an encoder for multimodal embedding. We use early fusion of modalities to train a single transformer on contrastive + masked (multimodal) modeling loss More details👇

thumb_up_off_alt12

chat_bubble_outline1

repeat2

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

5 months ago

🚨Excited to release OS-Harm! 🚨 The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm: 1. deliberate user misuse, 2. prompt injections, 3. model misbehavior.

thumb_up_off_alt94

chat_bubble_outline3

repeat26

shareShare

Maksym Andriushchenko @ ICLR

@maksym_andr

2 months ago

Accepted at NeurIPS 2025 Datasets and Benchmarks Track as a *spotlight*! See you in San Diego :-)

thumb_up_off_alt109

chat_bubble_outline4

repeat9

shareShare

TML Lab (EPFL)

@tml_lab

a month ago

Exciting opportunity! Want to join us as a postdoc? Apply for the EPFL AI Center Fellowship: epfl.ch/research/fundi…

thumb_up_off_alt3

chat_bubble_outline2

repeat3

shareShare

francesco croce

@fra__31

23 days ago

Happy to share that I've started as an assistant professor at Aalto University and ELLIS Institute Finland! I'll recruit students via the ELLIS PhD Program ellis.eu/research/phd-p… to work on multimodal learning, robustness, visual reasoning... feel free to reach out!

Happy to share that I've started as an assistant professor at <a href="/AaltoUniversity/">Aalto University</a> and ELLIS Institute Finland!

I'll recruit students via the ELLIS PhD Program ellis.eu/research/phd-p… to work on multimodal learning, robustness, visual reasoning... feel free to reach out!

thumb_up_off_alt27

chat_bubble_outline4

repeat5

shareShare