Ethan Perez (@ethanjperez) 's Twitter Profile
Ethan Perez

@ethanjperez

Large language model safety

ID: 908728623988953089

linkhttps://scholar.google.com/citations?user=za0-taQAAAAJ calendar_today15-09-2017 16:26:02

1,1K Tweet

7,7K Followers

507 Following

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Gradient-based adversarial image attacks/jailbreaks don't seem to transfer across vision-language models, unless the models are *really* similar. This is good (and IMO surprising) news for the robustness of VLMs! Check out our new paper on when these attacks do/don't transfer: