Richard Ren (@notrichardren) Twitter Tweets • TwiCopy

Richard Ren

@notrichardren

+ Follow

AI safety and benchmarking. Research scientist & engineer.

ID: 1609594870980550656

linkhttp://notrichardren.github.io calendar_today01-01-2023 16:58:27

129 Tweet

366 Followers

506 Following

James Campbell

@jam3scampbell

2 years ago

But you see, these fancified auto-complete appliances are in fact quite dumb (their intelligence lies somewhere between coccinella septempunctata and drosophila melanogaster, if we're being generous) because--simply put--they are helplessly incapable of associative

thumb_up_off_alt144

chat_bubble_outline15

repeat7

shareShare

Dan Hendrycks

@danhendrycks

a year ago

We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,000+ scenarios to systematically measure AI honesty. Center for AI Safety Scale AI

thumb_up_off_alt399

chat_bubble_outline15

repeat67

shareShare

Summer Yue

@summeryue0

a year ago

1/ Introducing MASK: an open source benchmark from @scale_ai & Center for AI Safety that tests AI honesty under pressure across 1,000+ scenarios: 🔗 mask-benchmark.ai How well do models stick to their internal beliefs when coerced? Let's dive in. 🧵👇

thumb_up_off_alt23

chat_bubble_outline3

repeat4

shareShare

Richard Ren

@notrichardren

a year ago

In "Utility Engineering", we also found that LLMs acquire emergent values which can be represented by utilities. Computing their utilities in honesty scenarios, we find a moderate correlation with MASK honesty scores. Suggests models lie more when they value honesty less.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Richard Ren

@notrichardren

a year ago

Often, LLM developers often report that their models are becoming more "truthful", but truthfulness conflates honesty with accuracy. By disentangling honesty from accuracy in the MASK benchmark, we find that as LLMs scale up they do not necessarily become more honest.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Davis Brown

@davisbrownr

a year ago

How useful are AI tools to working mathematicians? We are releasing a suite of challenging (many of them open!) research-level questions in algebraic combinatorics to test conjecturing ability in pure math.

thumb_up_off_alt33

chat_bubble_outline1

repeat4

shareShare

METR

@metr_evals

a year ago

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

thumb_up_off_alt4,4K

chat_bubble_outline158

repeat826

shareShare

Dan Hendrycks

@danhendrycks

a year ago

The "Helpful, Harmless, Honest" principles for AI when made more precise become fiduciary duties, reasonable care, and requiring that AIs not overtly lie.

thumb_up_off_alt131

chat_bubble_outline12

repeat11

shareShare

Summer Yue

@summeryue0

a year ago

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

thumb_up_off_alt49

chat_bubble_outline1

repeat15

shareShare

AdamK

@adamk133

a year ago

I basically think this shows we should already be treating existing SOTA models (or their immediate successors) as ASL-3, with urgent emphasis on effective safeguards. Congrats to Nathaniel Li for this important work

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare