Apollo Research (@apolloaievals) Twitter Tweets • TwiCopy

Apollo Research

@apolloaievals

+ Follow

We are an AI evals research organisation

ID: 1655925560596373506

linkhttps://www.apolloresearch.ai/ calendar_today09-05-2023 13:20:56

175 Tweet

5,5K Followers

0 Following

OpenAI

@openai

3 months ago

Today we’re releasing research with Apollo Research. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing

thumb_up_off_alt2,2K

chat_bubble_outline219

repeat341

shareShare

Apollo Research

@apolloaievals

2 months ago

We tested Sonnet-4.5 before deployment - Significantly higher verbalized evaluation awareness (58% vs. 22% for Opus-4.1) - It takes significantly fewer covert actions - We don't know if the increased alignment scores come from better alignment or higher eval awareness

thumb_up_off_alt253

chat_bubble_outline4

repeat17

shareShare