METR (@metr_evals) Twitter Tweets • TwiCopy

METR

@metr_evals

+ Follow

A research non-profit that develops evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

ID: 1706770561903497216

linkhttp://metr.org calendar_today26-09-2023 20:39:57

170 Tweet

6,6K Takipçi

15 Takip Edilen

Chris Painter

@chrispainteryup

8 months ago

METR is a non-profit dedicated to empirically measuring AI capabilities that could threaten catastrophic harm to society Our main constraint is hiring great senior researchers, so we’re offering a $21,000 referral bonus Our publishing velocity recently has been very high👇🧵

thumb_up_off_alt108

chat_bubble_outline4

repeat8

shareShare

Joel Becker

@joel_bkr

8 months ago

wicked preliminary result from Thomas Akira Kwa. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

wicked preliminary result from <a href="/Kwathomas0/">Thomas Akira Kwa</a>. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

thumb_up_off_alt85

chat_bubble_outline2

repeat13

shareShare

Megan Kinniment

@mkinniment

7 months ago

AI agent performance on HCAST & RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks. The best humans, on the other hand, seem to have less obvious plateaus. Some thoughts on this🧵

thumb_up_off_alt62

chat_bubble_outline3

repeat7

shareShare