Alex Turner (@turn_trout) 's Twitter Profile
Alex Turner

@turn_trout

Research scientist on the scalable alignment team at Google DeepMind. All views are my own. turntrout.com

ID: 1466176799960797186

calendar_today01-12-2021 22:46:20

246 Tweet

2,2K Followers

53 Following

Alex Turner (@turn_trout) 's Twitter Profile Photo

Want to get into alignment research? Alex Cloud & I mentor *Team Shard*, responsible for gradient routing, steering vectors, MELBO, and a new unlearning technique (TBA) :) We discover new research subfields. Apply for mentorship this summer at forms.matsprogram.org/turner-app-8

Want to get into alignment research? Alex Cloud & I mentor *Team Shard*, responsible for gradient routing, steering vectors, MELBO, and a new unlearning technique (TBA) :) We discover new research subfields. 

Apply for mentorship this summer at forms.matsprogram.org/turner-app-8
Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.)

AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
Alex Turner (@turn_trout) 's Twitter Profile Photo

Just realized that Simulators doesn't explain the "emergent misalignment" result, since IIRC they found that simply k-shot prompting the model doesn't elicit evil outputs. If finetuning on insecure code drew out an "evil" persona, then so should k-shot prompting.