Cozmin Ududec (@cududec) 's Twitter Profile
Cozmin Ududec

@cududec

@AISecurityInst Testing and Science of Evals. Ex quantum foundationalist.

ID: 1404056967220432899

calendar_today13-06-2021 12:45:38

354 Tweet

264 Followers

1,1K Following

Quanta Magazine (@quantamagazine) 's Twitter Profile Photo

One hundred years ago, a 23-year-old postdoc named Werner Heisenberg completed a calculation that would become the heart of quantum mechanics, a radical yet stunningly accurate theory of the atomic and subatomic world. quantamagazine.org/its-a-mess-a-b…

One hundred years ago, a 23-year-old postdoc named Werner Heisenberg completed a calculation that would become the heart of quantum mechanics, a radical yet stunningly accurate theory of the atomic and subatomic world. quantamagazine.org/its-a-mess-a-b…
Stella Biderman (@blancheminerva) 's Twitter Profile Photo

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and AI Security Institute joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons?

@AIEleuther and <a href="/AISecurityInst/">AI Security Institute</a> joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
Transluce (@transluceai) 's Twitter Profile Photo

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

Docent, our tool for analyzing complex AI behaviors, is now in public alpha!

It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.”

Today, anyone can get started with just a few lines of code!
David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

I'm glad to see a serious LLM forecasting effort. These kinds of forecasts seem like an undersupplied public good. I think good policy-conditional long-term forecasts will play a big part in avoiding bad outcomes for humanity, if we can get them set up in time.

Ryan Kidd (@ryan_kidd44) 's Twitter Profile Photo

MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.

MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley &amp; London offices, housing, and talks/workshops with AI experts.
Cozmin Ududec (@cududec) 's Twitter Profile Photo

I'll be a MATS mentor this winter! (Jan-Mar 2026) Come work with me on methods for improving dangerous capability evals, and understanding agent behaviours and goals. Apply by Oct 2nd – matsprogram.org/apply#Ududec

Robert Kirk (@_robertkirk) 's Twitter Profile Photo

We at AI Security Institute recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of Anthropic's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵

We at <a href="/AISecurityInst/">AI Security Institute</a> recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of <a href="/AnthropicAI/">Anthropic</a>'s Claude Sonnet 4.5!

This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵
AI Security Institute (@aisecurityinst) 's Twitter Profile Photo

Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵

Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short.

Our new report maps progress and highlights the key barriers that remain🧵
Cozmin Ududec (@cududec) 's Twitter Profile Photo

I really like this research programme aiming to understand goal directedness from the bottom up! Great example of how to combine conceptual clarity with systematic experiments.