
Egor Zverev @ICLR 2025
@egor_zverev_ai
ML Safety PhD@ISTA
ID: 1770024348675399680
https://github.com/egozverev/ 19-03-2024 10:15:33
32 Tweet
57 Takipรงi
160 Takip Edilen

๐ Weโve released the source code for ๐๐ฆ๐๐๐ (presented as an ๐ข๐ฟ๐ฎ๐น at the #ICLR2025 BuildTrust workshop)! ๐ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90ยฐ during instruction-tuning and inference ๐code
