Owain Evans
@owainevans_uk
Runs an AI Safety research group in Berkeley (Truthful AI) + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.
ID: 1247872005912891392
https://owainevans.github.io/ 08-04-2020 13:01:26
5,5K Tweet
11,11K Takipçi
322 Takip Edilen
AIs sometimes blackmail or sabotage shutdown mechanisms if told they're going to be turned off. Neel Nanda's team at Google DeepMind investigated and found in this case it wasn't at all what it seemed. x.com/robertwiblin/s…
What is happening in society and politics after widespread automation? What are the best ideas for good post-AGI futures, if any? David Duvenaud joins the podcast — pnc.st/s/forecast/163…
🪩The one and only State of AI 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
New AI Security Institute research with Anthropic + The Alan Turing Institute: The number of samples needed to backdoor poison LLMs stays nearly CONSTANT as models scale. With 500 samples, we insert backdoors in LLMs from 600m to 13b params, even as data scaled 20x.🧵/11