@metr_evals : We observed o3 in particular has a propensity to try to “hack” our tasks to get a higher score. Importantly, we saw this arise naturally from the model without explicit nudging. Behaviors like these have required us to be more careful in how we evaluate model capabilities. • TwiCopy

METR

@metr_evals

+ Follow

A research non-profit that develops evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

ID: 1706770561903497216

linkhttp://metr.org calendar_today26-09-2023 20:39:57

170 Tweet

6,6K Takipçi

15 Takip Edilen

METR

@metr_evals

5 months ago

We observed o3 in particular has a propensity to try to “hack” our tasks to get a higher score. Importantly, we saw this arise naturally from the model without explicit nudging. Behaviors like these have required us to be more careful in how we evaluate model capabilities.

thumb_up_off_alt174

chat_bubble_outline3

repeat20

shareShare