@ethanjperez : Gradient-based adversarial image attacks/jailbreaks don't seem to transfer across vision-language models, unless the models are *really* similar. This is good (and IMO surprising) news for the robustness of VLMs! Check out our new paper on when these attacks do/don't transfer: • TwiCopy

Ethan Perez

@ethanjperez

+ Follow

Large language model safety

ID: 908728623988953089

linkhttps://scholar.google.com/citations?user=za0-taQAAAAJ calendar_today15-09-2017 16:26:02

1,1K Tweet

7,7K Followers

507 Following

Ethan Perez

@ethanjperez

2 months ago

Gradient-based adversarial image attacks/jailbreaks don't seem to transfer across vision-language models, unless the models are *really* similar. This is good (and IMO surprising) news for the robustness of VLMs! Check out our new paper on when these attacks do/don't transfer:

thumb_up_off_alt55

chat_bubble_outline0

repeat4

shareShare