
Erik Jenner
@jenner_erik
Research scientist @ Google DeepMind working on AGI safety & alignment
ID: 724223679886929921
https://ejenner.com 24-04-2016 13:09:15
173 Tweet
882 Followers
149 Following



🧵 Announcing Open Philanthropy's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.







New episode with David Lindner, covering his work on MONA! Check it out - video link in reply.



My ML Alignment & Theory Scholars scholar Rohan just finished a cool paper on attacking latent-space probes with RL! Going in, I was unsure whether RL could explore into probe bypassing policies, or change the activations enough. Turns out it can, but not always. Go check out the thread & paper!




