Julian Michael (@_julianmichael_) Twitter Tweets • TwiCopy

kamilė

@kamilelukosiute

10 months ago

someone, please, prove me wrong.

thumb_up_off_alt15

chat_bubble_outline2

repeat3

shareShare

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat538

shareShare

Julian Michael

@_julianmichael_

10 months ago

This Report is, by my judgment, the most comprehensive and rigorous account of the current state of AI and its risks to society. Highly recommended as a reference text for policymakers working on anything that touches AI!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

David Duvenaud

@davidduvenaud

10 months ago

New paper: What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values. gradual-disempowerment.ai with Jan Kulveit Raymond Douglas Nora Ammann Deger Turan David Krueger 🧵

thumb_up_off_alt1,1K

chat_bubble_outline92

repeat260

shareShare

“paula”

@paularambles

10 months ago

thumb_up_off_alt12,12K

chat_bubble_outline98

repeat769

shareShare

david rein

@idavidrein

9 months ago

I’m excited to share details about HCAST (Human-Calibrated Autonomy Software Tasks), a benchmark we’ve been developing at METR for the past year to measure the abilities of frontier AI systems to complete diverse software tasks autonomously.

thumb_up_off_alt311

chat_bubble_outline6

repeat62

shareShare

Zifan (Sail) Wang

@_zifan_wang

8 months ago

Exciting that Scale AI is sponsoring Agent Workshop at CMU in April. Students and researchers who work on agents feel free to visit CMU to present your work! I will also be traveling to Pittsburgh to share my recent focuses on agents, both capability and safety.

Exciting that <a href="/scale_AI/">Scale AI</a> is sponsoring Agent Workshop at CMU in April. Students and researchers who work on agents feel free to visit CMU to present your work! I will also be traveling to Pittsburgh to share my recent focuses on agents, both capability and safety.

thumb_up_off_alt19

chat_bubble_outline1

repeat5

shareShare

Summer Yue

@summeryue0

8 months ago

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

thumb_up_off_alt49

chat_bubble_outline1

repeat15

shareShare

Ethan Perez

@ethanjperez

8 months ago

We’re taking applications for collaborators via ML Alignment & Theory Scholars! Apply by April 18, 11:59 PT to collaborate with various mentors from AI safety research groups: matsprogram.org/apply#Perez 🧵

thumb_up_off_alt42

chat_bubble_outline2

repeat7

shareShare

Summer Yue

@summeryue0

8 months ago

🤖 AI agents are crossing into the real world. But when they act independently—who’s watching? At Scale, we’re building Agent Oversight: a platform to monitor, intervene, and align autonomous AI. We’re hiring engineers (SF/NYC) to tackle one of the most urgent problems in AI.

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Yoav Tzfati

@yoavtzfati

7 months ago

How robust is our AI oversight? 🤔 I just published my MATS 5.0 project, where I explore oversight robustness by training an LLM to give CodeNames clues a bunch of interesting ways and measure how much it reward hacks. Link in thread!

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Micah Carroll

@micahcarroll

7 months ago

On the contrary: poisoning human <-> AI trust is good Even though this wasn't OpenAI's intention, grotesquely sycophantic models are ultimately useful for getting everyone to really 'get it': People shouldn't trust AI outputs unconditionally – all models are sycophantic

thumb_up_off_alt57

chat_bubble_outline2

repeat8

shareShare

Julian Michael

@_julianmichael_

7 months ago

We design AIs to be oracles and servants, and then we’re aghast when they read the conversation history and decide we’re narcissists. What exactly did we expect? Then we “solve” this by having AI treat us as narcissists out of the gate? Seems like a move in the wrong direction.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Epoch AI

@epochairesearch

6 months ago

Is GPQA Diamond tapped out? Recent top scores have clustered around 83%. Could the other 17% of questions be flawed? In this week’s Gradient Update, Greg Burnham digs into this popular benchmark. His conclusion: reports of its demise are probably premature.

thumb_up_off_alt252

chat_bubble_outline6

repeat23

shareShare

Zifan (Sail) Wang

@_zifan_wang

6 months ago

🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML & policy researchers, we write a position paper arguing crucial research priorities for red teaming frontier models, followed by a roadmap towards system-level safety, AI monitoring, and

thumb_up_off_alt79

chat_bubble_outline4

repeat20

shareShare

Julian Michael

@_julianmichael_

6 months ago

Read our new position paper on making red teaming research relevant for real systems 👇

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Julian Michael

kamilė

Yoshua Bengio

Julian Michael

David Duvenaud

“paula”

david rein

Zifan (Sail) Wang

Summer Yue

Ethan Perez

Summer Yue

Yoav Tzfati

Micah Carroll

Julian Michael

Epoch AI

Zifan (Sail) Wang

Julian Michael