Dan Hendrycks (@DanHendrycks) Twitter Tweets • TwiCopy

Dan Hendrycks

@DanHendrycks

+ Follow

• Director of the Center for AI Safety (https://t.co/ahs3LYCpqv)
• GELU/MMLU/MATH
• PhD in AI from UC Berkeley
https://t.co/rgXHAnYAsQ
https://t.co/nPSyQMaY9b

ID:68538286

linkhttp://danhendrycks.com calendar_today24-08-2009 23:08:53

600 Tweet

17,3K Takipçi

80 Takip Edilen

Follow People

Aran Komatsuzaki

@TeraflopAI

+ Follow

Jason Wei

ai researcher @openai

+ Follow

Jan Leike

ML Researcher, ex-OpenAI. Optimizing for a post-AGI future where humanity flourishes.

+ Follow

Catherine Olsson

Hanging out with Claude, improving its behavior, and building tools to support that @AnthropicAI 😁 prev: @open_phil @googlebrain @openai (@microcovid)

+ Follow

Thomas G. Dietterich

Distinguished Professor (Emeritus), Oregon State Univ.; Former President, Assoc. for the Adv. of Artificial Intelligence; Robust AI & Comput. Sustainability

+ Follow

Zvi Mowshowitz

3 hafta önce

My extensive Q&A on California's SB 1047. There are a lot of misconceptions to clear up. thezvi.substack.com/p/q-and-a-on-p…

thumb_up_off_alt10

chat_bubble_outline0

account_circle

Dan Hendrycks

3 hafta önce

HarmBench (arxiv.org/abs/2402.04249) and WMDP (arxiv.org/abs/2403.03218) were accepted!

thumb_up_off_alt36

chat_bubble_outline0

account_circle

Dan Hendrycks

3 hafta önce

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.

arxiv.org/abs/2405.00332

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not. arxiv.org/abs/2405.00332

thumb_up_off_alt290

chat_bubble_outline0

account_circle

Dan Hendrycks

3 hafta önce

SB 1047 highlights and FAQ
safesecureai.org/learn

thumb_up_off_alt29

chat_bubble_outline0

account_circle

Geoffrey Hinton

@geoffreyhinton

3 hafta önce

martin_casado Radical Ventures If you leave it to companies to decide what is safe you get the Boeing 737 max.

thumb_up_off_alt424

chat_bubble_outline0

account_circle

Dan Hendrycks

3 hafta önce

Hinton and Bengio on SB 1047 and a summary of the bill.

Hinton: “SB 1047 takes a very sensible approach... I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to

Hinton and Bengio on SB 1047 and a summary of the bill. Hinton: “SB 1047 takes a very sensible approach... I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to

thumb_up_off_alt99

chat_bubble_outline0

account_circle

Dan Hendrycks

1 ay önce

I would guess this is likely won't hold up to better adversaries.
In making the RepE paper (ai-transparency.org) we explored using it for trojans ('sleeper agents') and found it didn't work after basic stress testing.

thumb_up_off_alt100

chat_bubble_outline0

account_circle

Dan Hendrycks

1 ay önce

GPT-5 doesn't seem likely to be released this year.
Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute.
That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s.
I doubt OpenAI has a 1 million GPU server ready.

thumb_up_off_alt144

chat_bubble_outline0

account_circle

Kevin Roose

1 ay önce

AI researchers like Dan Hendrycks, who helped create the MMLU (essentially the SAT for chatbots) told me that leading benchmark tests have reached 'saturation' -- basically, they're too easy for today's LLMs -- and that we will soon need to develop harder tests to gauge model

AI researchers like @DanHendrycks, who helped create the MMLU (essentially the SAT for chatbots) told me that leading benchmark tests have reached 'saturation' -- basically, they're too easy for today's LLMs -- and that we will soon need to develop harder tests to gauge model

thumb_up_off_alt47

chat_bubble_outline0

account_circle

Dan Hendrycks

1 ay önce

I got ~75% on a subset of MATH so it's basically as good as me at math.

thumb_up_off_alt407

chat_bubble_outline0

account_circle

Center for AI Safety

1 ay önce

We’re excited to announce SafeBench, a competition to develop benchmarks for empirically assessing safety! There are $250,000 in prizes, with submissions closing on Feb 25th, 2025. This project is supported by Schmidt Sciences.

Visit: mlsafety.org/safebench

🧵(1/3)

thumb_up_off_alt75

chat_bubble_outline0

account_circle

Dan Hendrycks

2 ay önce

People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management.

Possible trigger: AI might suddenly become viewed as the

thumb_up_off_alt283

chat_bubble_outline0

account_circle

Dan Hendrycks

2 ay önce

x.ai/blog/grok-os
Grok-1 is open sourced.

Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose

thumb_up_off_alt372

chat_bubble_outline0

account_circle

Dan Hendrycks

2 ay önce

We've released a post about the looming risk of AI cyberattacks on critical infrastructure.
It notes that we are living under a 'cyberattack overhang.'
Advances in defensive techniques are of no help if defenders are not keeping up to date.

safe.ai/blog/cybersecu… by Steve Newman

thumb_up_off_alt73

chat_bubble_outline0

account_circle

Dan Hendrycks

2 ay önce

Reminder that 'Responsible Scaling Policies' are just non-binding proclamations and as such shouldn't be interpreted as a strong line of defense for safety.

Voluntary commitments can be easily violated without much social blowback. For example, responsible AI teams have been

thumb_up_off_alt116

chat_bubble_outline0

account_circle

fpc ok :)