Claudia (@claudiabaclava) 's Twitter Profile
Claudia

@claudiabaclava

sense

ID: 4256954171

linkhttps://www.researchgate.net/profile/Claudia-Claros-2 calendar_today23-11-2015 11:54:50

16 Tweet

93 Takipçi

741 Takip Edilen

Claudia (@claudiabaclava) 's Twitter Profile Photo

"Could we detect & remove an AI system's deceptive strategy ? (...) Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety." arxiv.org/abs/2401.05566