@fazlbarez : New paper🚨: We introduce POISONBENCH, a benchmark for assessing LLM vulnerabilities to data poisoning during preference learning. Key finding: Even 3% poisoned data can cause up to 850% performance deviation when triggered. 🧵 • TwiCopy

Fazl Barez

@fazlbarez

+ Follow

Making AI safe one Google doc at a time| Let's build AI's we can trust!

ID: 1341019917005537280

linkhttps://fbarez.github.io calendar_today21-12-2020 13:57:26

464 Tweet

1,1K Takipçi

729 Takip Edilen

Fazl Barez

@fazlbarez

10 months ago

New paper🚨: We introduce POISONBENCH, a benchmark for assessing LLM vulnerabilities to data poisoning during preference learning. Key finding: Even 3% poisoned data can cause up to 80% performance deviation when triggered. 🧵

thumb_up_off_alt47

chat_bubble_outline1

repeat15

shareShare