Leopold Aschenbrenner(@leopoldasch) 's Twitter Profileg
Leopold Aschenbrenner

@leopoldasch

superalignment @ openai

ID:2989966781

linkhttp://forourposterity.com calendar_today21-01-2015 14:18:33

1,0K Tweets

12,6K Followers

3,5K Following

Sholto Douglas(@_sholtodouglas) 's Twitter Profile Photo

One of the best parts of SF is hanging out with my good friends Dwarkesh Patel and Trenton Bricken.

Dwarkesh is the best interviewer in the world - and I hope this gives you a good feeling for what’s it’s like to be on the ground in the labs. It only gets crazier from here!

account_circle
Leopold Aschenbrenner(@leopoldasch) 's Twitter Profile Photo

One year since GPT-4 release. Hope you all enjoyed some time to relax; it’ll have been the slowest 12 months of AI progress for quite some time to come.

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?

We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge.

📝arxiv.org/abs/2403.03218
🔗wmdp.ai

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝arxiv.org/abs/2403.03218 🔗wmdp.ai
account_circle
Roger Grosse(@RogerGrosse) 's Twitter Profile Photo

Here's what I see as a likely AGI trajectory over the next decade.

I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.

Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.
account_circle
Nat Friedman(@natfriedman) 's Twitter Profile Photo

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD.

Today we are overjoyed to announce that our crazy project has succeeded. After 2000…

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000…
account_circle
Leopold Aschenbrenner(@leopoldasch) 's Twitter Profile Photo

Churchill, 1924:

“Might not a bomb no bigger than an orange be found to possess a secret power to destroy a whole block of buildings — nay, to concentrate the force of a thousand tons of cordite and blast a township at a stroke?”

Incredible essay: akademician.files.wordpress.com/2019/08/church…

Churchill, 1924: “Might not a bomb no bigger than an orange be found to possess a secret power to destroy a whole block of buildings — nay, to concentrate the force of a thousand tons of cordite and blast a township at a stroke?” Incredible essay: akademician.files.wordpress.com/2019/08/church…
account_circle
Noam Brown(@polynoamial) 's Twitter Profile Photo

A good example is Sholto Douglas at Google DeepMind. He's quiet on Twitter, doesn't have any flashy first-author publications, and has only been in the field for ~1.5 years, but people in AI know he was one of the most important people behind Gemini's success

account_circle
Eric Schmidt(@ericschmidt) 's Twitter Profile Photo

openai.com/blog/superalig…
This group from OpenAI are among the smartest people i have ever met. I'm very pleased to be one of their supporters, please review and apply to work with them !!!!!!!!!!!!

account_circle
Aleksander Madry(@aleks_madry) 's Twitter Profile Photo

So happy about this release and grateful to my awesome Preparedness team (especially Tejal Patwardhan), Policy Research, SuperAlignment and all of OpenAI for the hard work it took to get us here. It is still only a start but the work will continue!

account_circle
Neel Nanda(@NeelNanda5) 's Twitter Profile Photo

Cool work from Google DeepMind alignment on limitations of methods for eliciting a model's beliefs!

My key takeaway is that unsupervised methods (eg CCS) rely on 'proxy properties' of true beliefs, but other features share these proxies! Eg 'agrees with the user' vs 'is true'

Cool work from @GoogleDeepMind alignment on limitations of methods for eliciting a model's beliefs! My key takeaway is that unsupervised methods (eg CCS) rely on 'proxy properties' of true beliefs, but other features share these proxies! Eg 'agrees with the user' vs 'is true'
account_circle