Kjartan Krange (@kjartankrange) 's Twitter Profile
Kjartan Krange

@kjartankrange

ID: 1496596801297829889

calendar_today23-02-2022 21:24:51

17 Tweet

15 Followers

138 Following

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

Unfortunately, we're now at the point where new models have really high eval awareness. For every alignment eval score I see, I now add a mental asterisk: *the model could have also just realized it's being evaluated, who knows. And I think that's concerning!