Dan Hendrycks(@DanHendrycks) 's Twitter Profileg
Dan Hendrycks

@DanHendrycks

• Director of the Center for AI Safety (https://t.co/ahs3LYCpqv)
• GELU/MMLU/MATH
• PhD in AI from UC Berkeley
https://t.co/rgXHAnYAsQ
https://t.co/nPSyQMaY9b

ID:68538286

linkhttp://danhendrycks.com calendar_today24-08-2009 23:08:53

600 Tweet

17,3K Takipçi

80 Takip Edilen

Follow People
Zvi Mowshowitz(@TheZvi) 's Twitter Profile Photo

My extensive Q&A on California's SB 1047. There are a lot of misconceptions to clear up. thezvi.substack.com/p/q-and-a-on-p…

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.

arxiv.org/abs/2405.00332

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not. arxiv.org/abs/2405.00332
account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

Hinton and Bengio on SB 1047 and a summary of the bill.

Hinton: “SB 1047 takes a very sensible approach... I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to

Hinton and Bengio on SB 1047 and a summary of the bill. Hinton: “SB 1047 takes a very sensible approach... I am still passionate about the potential for AI to save lives through improvements in science and medicine, but it’s critical that we have legislation with real teeth to
account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

I would guess this is likely won't hold up to better adversaries.
In making the RepE paper (ai-transparency.org) we explored using it for trojans ('sleeper agents') and found it didn't work after basic stress testing.

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

GPT-5 doesn't seem likely to be released this year.
Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute.
That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s.
I doubt OpenAI has a 1 million GPU server ready.

account_circle
Kevin Roose(@kevinroose) 's Twitter Profile Photo

AI researchers like Dan Hendrycks, who helped create the MMLU (essentially the SAT for chatbots) told me that leading benchmark tests have reached 'saturation' -- basically, they're too easy for today's LLMs -- and that we will soon need to develop harder tests to gauge model

AI researchers like @DanHendrycks, who helped create the MMLU (essentially the SAT for chatbots) told me that leading benchmark tests have reached 'saturation' -- basically, they're too easy for today's LLMs -- and that we will soon need to develop harder tests to gauge model
account_circle
Center for AI Safety(@ai_risks) 's Twitter Profile Photo

We’re excited to announce SafeBench, a competition to develop benchmarks for empirically assessing safety! There are $250,000 in prizes, with submissions closing on Feb 25th, 2025. This project is supported by Schmidt Sciences.

Visit: mlsafety.org/safebench

🧵(1/3)

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management.

Possible trigger: AI might suddenly become viewed as the

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

x.ai/blog/grok-os
Grok-1 is open sourced.

Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

We've released a post about the looming risk of AI cyberattacks on critical infrastructure.
It notes that we are living under a 'cyberattack overhang.'
Advances in defensive techniques are of no help if defenders are not keeping up to date.

safe.ai/blog/cybersecu… by Steve Newman

account_circle
Dan Hendrycks(@DanHendrycks) 's Twitter Profile Photo

Reminder that 'Responsible Scaling Policies' are just non-binding proclamations and as such shouldn't be interpreted as a strong line of defense for safety.

Voluntary commitments can be easily violated without much social blowback. For example, responsible AI teams have been

account_circle