Haozhe Jiang
@erichzjiang
ID: 1646700106270539776
14-04-2023 02:21:33
1 Tweet
22 Followers
111 Following
🧵Can modern neural networks ever be trained to be jailbreak-proof? Could training alone stop them from outputting harmful content?🤔 In a new paper with the all-star Haozhe (Eric) Jiang Haozhe Jiang we show that in most cases jailbreaks are mathematically inevitable 😯 They are