Dr. Karen Ullrich (@karen_ullrich) 's Twitter Profile
Dr. Karen Ullrich

@karen_ullrich

Research scientist at FAIR NY + collab w/ Vector Institute. ❤️ Machine Learning + Information Theory. Previously, PhD at UoAmsterdam, intern at DeepMind + MSRC.

ID: 2236492597

linkhttp://karenullrich.info calendar_today08-12-2013 19:32:19

265 Tweet

5,5K Takipçi

586 Takip Edilen

Dr. Karen Ullrich (@karen_ullrich) 's Twitter Profile Photo

Even with preference alignment, LLMs can be enticed into harmful behavior via adversarial prompts 😈. 🚨 Breaking: our theoretical findings confirm: LLM alignment is fundamentally limited! More details, on framework, statistical bounds and phenomenal defense results 👇🏻

Even with preference alignment, LLMs can be enticed into harmful behavior via adversarial prompts  😈.

🚨 Breaking: our theoretical findings confirm:
LLM alignment is fundamentally limited!

More details, on framework, statistical bounds and phenomenal defense results 👇🏻