
Lukas Aichberger
@aichberger
PhD Student at the Institute for Machine Learning @JKULinz and @OATML_Oxford as part of @ELLISforEurope
ID: 1467768913178112001
06-12-2021 08:12:58
36 Tweet
181 Takipçi
175 Takip Edilen

Sebastian Farquhar We looked into the theory about this in our recent work on how to efficiently obtain samples to estimate semantic entropy. We also found that this correct estimator boosts performance a lot: arxiv.org/abs/2406.04306






Defending against adversarial prompts is hard; defending against fine-tuning API attacks is much harder. In our new AI Security Institute pre-print, we break alignment and extract harmful info using entirely benign and natural interactions during fine-tuning & inference. 😮 🧵 1/10




Exciting new paper! We show how #Agentic #AI, web based Agentic AI in particular, can be jailbroken and made to propagate these jailbreaks at scale—just by posting images on social media. A system-level attack beyond just VLMs. Great work led by Lukas Aichberger