Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile
Cas (Stephen Casper)

@stephenlcasper

AI technical governance & risk management research. PhD Candidate @MIT_CSAIL / @MITEECS. Also at scasper.bsky.social.

stephencasper.com

ID: 704559922143322112

linkhttp://stephencasper.com calendar_today01-03-2016 06:52:30

1,1K Tweet

4,4K Followers

3,3K Following

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

📣 New paper AI gov. frameworks are being designed to rely on rigorous assessments of capabilities & risks. But risk evals are [still] pretty bad – they regularly fail to find overtly harmful behaviors that surface post-deployment. Model tampering attacks can help with this.

📣 New paper

AI gov. frameworks are being designed to rely on rigorous assessments of capabilities & risks. But risk evals are [still] pretty bad – they regularly fail to find overtly harmful behaviors that surface post-deployment.

Model tampering attacks can help with this.