Egor Zverev @ICLR 2025 (@egor_zverev_ai) 's Twitter Profile
Egor Zverev @ICLR 2025

@egor_zverev_ai

ML Safety PhD@ISTA

ID: 1770024348675399680

linkhttps://github.com/egozverev/ calendar_today19-03-2024 10:15:33

32 Tweet

57 Takipรงi

160 Takip Edilen

Egor Zverev @ICLR 2025 (@egor_zverev_ai) 's Twitter Profile Photo

๐Ÿš€ Weโ€™ve released the source code for ๐—”๐—ฆ๐—œ๐——๐—˜ (presented as an ๐—ข๐—ฟ๐—ฎ๐—น at the #ICLR2025 BuildTrust workshop)! ๐Ÿ”ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90ยฐ during instruction-tuning and inference ๐Ÿ‘‡code

๐Ÿš€ Weโ€™ve released the source code for ๐—”๐—ฆ๐—œ๐——๐—˜ (presented as an ๐—ข๐—ฟ๐—ฎ๐—น at the #ICLR2025 BuildTrust workshop)!

๐Ÿ”ASIDE boosts prompt injection robustness without safety-tuning: we simply rotate embeddings of marked tokens by 90ยฐ during instruction-tuning and inference

๐Ÿ‘‡code