METR (@metr_evals) 's Twitter Profile
METR

@metr_evals

A research non-profit that develops evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

ID: 1706770561903497216

linkhttp://metr.org calendar_today26-09-2023 20:39:57

170 Tweet

6,6K Followers

15 Following

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

METR is a non-profit dedicated to empirically measuring AI capabilities that could threaten catastrophic harm to society Our main constraint is hiring great senior researchers, so we’re offering a $21,000 referral bonus Our publishing velocity recently has been very high👇🧵

METR is a non-profit dedicated to empirically measuring AI capabilities that could threaten catastrophic harm to society

Our main constraint is hiring great senior researchers, so we’re offering a $21,000 referral bonus

Our publishing velocity recently has been very high👇🧵
Joel Becker (@joel_bkr) 's Twitter Profile Photo

wicked preliminary result from Thomas Akira Kwa. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

wicked preliminary result from <a href="/Kwathomas0/">Thomas Akira Kwa</a>. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both
Megan Kinniment (@mkinniment) 's Twitter Profile Photo

AI agent performance on HCAST & RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks. The best humans, on the other hand, seem to have less obvious plateaus. Some thoughts on this🧵

AI agent performance on HCAST &amp; RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks.

The best humans, on the other hand, seem to have less obvious plateaus.

Some thoughts on this🧵