Cozmin Ududec (@cududec) Twitter Tweets • TwiCopy

Cozmin Ududec

@cududec

+ Follow

@AISecurityInst Testing and Science of Evals. Ex quantum foundationalist.

ID: 1404056967220432899

calendar_today13-06-2021 12:45:38

354 Tweet

264 Followers

1,1K Following

Quanta Magazine

@quantamagazine

3 months ago

One hundred years ago, a 23-year-old postdoc named Werner Heisenberg completed a calculation that would become the heart of quantum mechanics, a radical yet stunningly accurate theory of the atomic and subatomic world. quantamagazine.org/its-a-mess-a-b…

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat293

shareShare

Stella Biderman

@blancheminerva

3 months ago

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons? @AIEleuther and AI Security Institute joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons?

@AIEleuther and <a href="/AISecurityInst/">AI Security Institute</a> joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

thumb_up_off_alt556

chat_bubble_outline28

repeat72

shareShare

Cozmin Ududec

@cududec

3 months ago

This is a really thoughtful and grounded discussion of the tradeoffs in scientific comms!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Transluce

@transluceai

3 months ago

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

thumb_up_off_alt205

chat_bubble_outline7

repeat36

shareShare

David Duvenaud

@davidduvenaud

3 months ago

I'm glad to see a serious LLM forecasting effort. These kinds of forecasts seem like an undersupplied public good. I think good policy-conditional long-term forecasts will play a big part in avoiding bad outcomes for humanity, if we can get them set up in time.

thumb_up_off_alt45

chat_bubble_outline6

repeat4

shareShare

David Duvenaud

@davidduvenaud

3 months ago

Last month we held a workshop on Post-AGI outcomes. Here’s a thread of all the talks! 🧵 x.com/DavidDuvenaud/…

thumb_up_off_alt178

chat_bubble_outline15

repeat35

shareShare

Ryan Kidd

@ryan_kidd44

3 months ago

MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.

thumb_up_off_alt217

chat_bubble_outline10

repeat54

shareShare

Cozmin Ududec

@cududec

3 months ago

I'll be a MATS mentor this winter! (Jan-Mar 2026) Come work with me on methods for improving dangerous capability evals, and understanding agent behaviours and goals. Apply by Oct 2nd – matsprogram.org/apply#Ududec

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Robert Kirk

@_robertkirk

2 months ago

We at AI Security Institute recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of Anthropic's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵

We at <a href="/AISecurityInst/">AI Security Institute</a> recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of <a href="/AnthropicAI/">Anthropic</a>'s Claude Sonnet 4.5!

This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵

thumb_up_off_alt49

chat_bubble_outline3

repeat12

shareShare

AI Security Institute

@aisecurityinst

a month ago

Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵

thumb_up_off_alt52

chat_bubble_outline1

repeat10

shareShare

Cozmin Ududec

@cududec

19 days ago

I really like this research programme aiming to understand goal directedness from the bottom up! Great example of how to combine conceptual clarity with systematic experiments.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare