Richard Ren (@notrichardren) 's Twitter Profile
Richard Ren

@notrichardren

AI safety and benchmarking. Research scientist & engineer.

ID: 1609594870980550656

linkhttp://notrichardren.github.io calendar_today01-01-2023 16:58:27

129 Tweet

366 Followers

506 Following

James Campbell (@jam3scampbell) 's Twitter Profile Photo

But you see, these fancified auto-complete appliances are in fact quite dumb (their intelligence lies somewhere between coccinella septempunctata and drosophila melanogaster, if we're being generous) because--simply put--they are helplessly incapable of associative

Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,000+ scenarios to systematically measure AI honesty. Center for AI Safety Scale AI

We found that when under pressure, some AI systems lie more readily than others.
We’re releasing MASK, a benchmark of 1,000+ scenarios to systematically measure AI honesty.

<a href="/ai_risks/">Center for AI Safety</a> <a href="/scale_AI/">Scale AI</a>
Summer Yue (@summeryue0) 's Twitter Profile Photo

1/ Introducing MASK: an open source benchmark from @scale_ai & Center for AI Safety that tests AI honesty under pressure across 1,000+ scenarios: 🔗 mask-benchmark.ai How well do models stick to their internal beliefs when coerced? Let's dive in. 🧵👇

Richard Ren (@notrichardren) 's Twitter Profile Photo

In "Utility Engineering", we also found that LLMs acquire emergent values which can be represented by utilities. Computing their utilities in honesty scenarios, we find a moderate correlation with MASK honesty scores. Suggests models lie more when they value honesty less.

In "Utility Engineering", we also found that LLMs acquire emergent values which can be represented by utilities. Computing their utilities in honesty scenarios, we find a moderate correlation with MASK honesty scores. 

Suggests models lie more when they value honesty less.
Richard Ren (@notrichardren) 's Twitter Profile Photo

Often, LLM developers often report that their models are becoming more "truthful", but truthfulness conflates honesty with accuracy. By disentangling honesty from accuracy in the MASK benchmark, we find that as LLMs scale up they do not necessarily become more honest.

Davis Brown (@davisbrownr) 's Twitter Profile Photo

How useful are AI tools to working mathematicians? We are releasing a suite of challenging (many of them open!) research-level questions in algebraic combinatorics to test conjecturing ability in pure math.

How useful are AI tools to working mathematicians? We are releasing a suite of challenging (many of them open!) research-level questions in algebraic combinatorics to test conjecturing ability in pure math.
METR (@metr_evals) 's Twitter Profile Photo

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

When will AI systems be able to carry out long projects independently?

In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

The "Helpful, Harmless, Honest" principles for AI when made more precise become fiduciary duties, reasonable care, and requiring that AIs not overtly lie.

The "Helpful, Harmless, Honest" principles for AI when made more precise become fiduciary duties, reasonable care, and requiring that AIs not overtly lie.
Summer Yue (@summeryue0) 's Twitter Profile Photo

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

If a model lies when pressured—it’s not ready for AGI.

The new MASK leaderboard is live.

Built on the private split of our open-source honesty benchmark (w/ <a href="/ai_risks/">Center for AI Safety</a>), it tests whether models lie under pressure—even when they know better.

📊 Leaderboard:
AdamK (@adamk133) 's Twitter Profile Photo

I basically think this shows we should already be treating existing SOTA models (or their immediate successors) as ASL-3, with urgent emphasis on effective safeguards. Congrats to Nathaniel Li for this important work