Anthropic (@anthropicai) 's Twitter Profile
Anthropic

@anthropicai

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at Claude.ai.

ID: 1353836358901501952

linkhttp://anthropic.com calendar_today25-01-2021 22:45:28

872 Tweet

515,515K Takipçi

35 Takip Edilen

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Auditing Language Models for Hidden Objectives. We deliberately trained a model with a hidden misaligned objective and put researchers to the test: Could they figure out the objective without being told?

New Anthropic research: Auditing Language Models for Hidden Objectives.

We deliberately trained a model with a hidden misaligned objective and put researchers to the test: Could they figure out the objective without being told?