Alex Turner (@turn_trout) Twitter Tweets • TwiCopy

Alex Turner

@turn_trout

+ Follow

Research scientist on the scalable alignment team at Google DeepMind. All views are my own. turntrout.com

ID: 1466176799960797186

calendar_today01-12-2021 22:46:20

246 Tweet

2,2K Takipçi

53 Takip Edilen

Alex Turner

@turn_trout

8 months ago

Want to get into alignment research? Alex Cloud & I mentor *Team Shard*, responsible for gradient routing, steering vectors, MELBO, and a new unlearning technique (TBA) :) We discover new research subfields. Apply for mentorship this summer at forms.matsprogram.org/turner-app-8

thumb_up_off_alt39

chat_bubble_outline2

repeat5

shareShare

Rohin Shah

@rohinmshah

8 months ago

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

thumb_up_off_alt362

chat_bubble_outline13

repeat68

shareShare

Alex Turner

@turn_trout

8 months ago

Just realized that Simulators doesn't explain the "emergent misalignment" result, since IIRC they found that simply k-shot prompting the model doesn't elicit evil outputs. If finetuning on insecure code drew out an "evil" persona, then so should k-shot prompting.

thumb_up_off_alt112

chat_bubble_outline20

repeat1

shareShare