Leopold Aschenbrenner (@leopoldasch) Twitter Tweets • TwiCopy

Leopold Aschenbrenner

@leopoldasch

+ Follow

superalignment @ openai

ID:2989966781

linkhttp://forourposterity.com calendar_today21-01-2015 14:18:33

1,0K Tweets

12,6K Followers

3,5K Following

Sholto Douglas

@_sholtodouglas

1 month ago

One of the best parts of SF is hanging out with my good friends Dwarkesh Patel and Trenton Bricken.

Dwarkesh is the best interviewer in the world - and I hope this gives you a good feeling for what’s it’s like to be on the ground in the labs. It only gets crazier from here!

thumb_up_off_alt208

chat_bubble_outline0

account_circle

Leopold Aschenbrenner

1 month ago

One year since GPT-4 release. Hope you all enjoyed some time to relax; it’ll have been the slowest 12 months of AI progress for quite some time to come.

thumb_up_off_alt1,9K

chat_bubble_outline0

account_circle

Dan Hendrycks

1 month ago

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?

We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge.

📝arxiv.org/abs/2403.03218
🔗wmdp.ai

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝arxiv.org/abs/2403.03218 🔗wmdp.ai

thumb_up_off_alt238

chat_bubble_outline0

account_circle

Roger Grosse

2 months ago

Here's what I see as a likely AGI trajectory over the next decade.

I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.

Here's what I see as a likely AGI trajectory over the next decade. I claim that later parts of the path present the biggest alignment risks/challenges. The alignment world has been focusing a lot on the lower left corner lately, which I'm worried is somewhat of a Maginot line.

thumb_up_off_alt513

chat_bubble_outline0

account_circle

Jan Leike

2 months ago

This is a reminder that the application deadline is in less than 2 weeks!

thumb_up_off_alt44

chat_bubble_outline0

account_circle

Nat Friedman

2 months ago

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD.

Today we are overjoyed to announce that our crazy project has succeeded. After 2000…

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000…

thumb_up_off_alt70,7K

chat_bubble_outline0

account_circle

Dwarkesh Patel

3 months ago

Leopold Aschenbrenner Churchill had an amazing (and underappreciated) track record as a futurist

Ht Jason Crawford

rootsofprogress.org/winston-church…

thumb_up_off_alt31

chat_bubble_outline0

account_circle

Leopold Aschenbrenner

3 months ago

Churchill, 1924:

“Might not a bomb no bigger than an orange be found to possess a secret power to destroy a whole block of buildings — nay, to concentrate the force of a thousand tons of cordite and blast a township at a stroke?”

Incredible essay: akademician.files.wordpress.com/2019/08/church…

Churchill, 1924: “Might not a bomb no bigger than an orange be found to possess a secret power to destroy a whole block of buildings — nay, to concentrate the force of a thousand tons of cordite and blast a township at a stroke?” Incredible essay: akademician.files.wordpress.com/2019/08/church…

thumb_up_off_alt31

chat_bubble_outline0

account_circle

Noam Brown

3 months ago

A good example is Sholto Douglas at Google DeepMind. He's quiet on Twitter, doesn't have any flashy first-author publications, and has only been in the field for ~1.5 years, but people in AI know he was one of the most important people behind Gemini's success

thumb_up_off_alt784

chat_bubble_outline0

account_circle

Leopold Aschenbrenner

3 months ago

we’ll be making plots like this of nvidia revenue forecasts in a few years

we’ll be making plots like this of nvidia revenue forecasts in a few years

thumb_up_off_alt55

chat_bubble_outline0

account_circle

Chris Olah

3 months ago

Backdoored models can resist safety training afterwards, and larger models resisting more effectively.

thumb_up_off_alt128

chat_bubble_outline0

account_circle

Leopold Aschenbrenner

4 months ago

Very grateful to Eric Schmidt for helping make Superalignment Fast Grants possible!

thumb_up_off_alt40

chat_bubble_outline0

account_circle

Eric Schmidt

4 months ago

openai.com/blog/superalig…
This group from OpenAI are among the smartest people i have ever met. I'm very pleased to be one of their supporters, please review and apply to work with them !!!!!!!!!!!!

thumb_up_off_alt808

chat_bubble_outline0

account_circle

Aleksander Madry

4 months ago

So happy about this release and grateful to my awesome Preparedness team (especially Tejal Patwardhan), Policy Research, SuperAlignment and all of OpenAI for the hard work it took to get us here. It is still only a start but the work will continue!

thumb_up_off_alt96

chat_bubble_outline0

account_circle

Neel Nanda

4 months ago

Cool work from Google DeepMind alignment on limitations of methods for eliciting a model's beliefs!

My key takeaway is that unsupervised methods (eg CCS) rely on 'proxy properties' of true beliefs, but other features share these proxies! Eg 'agrees with the user' vs 'is true'

Cool work from @GoogleDeepMind alignment on limitations of methods for eliciting a model's beliefs! My key takeaway is that unsupervised methods (eg CCS) rely on 'proxy properties' of true beliefs, but other features share these proxies! Eg 'agrees with the user' vs 'is true'

thumb_up_off_alt93

chat_bubble_outline0

account_circle