Julian Michael (@_julianmichael_) 's Twitter Profile
Julian Michael

@_julianmichael_

Researching stuff.

ID: 1019072664600637440

linkhttps://julianmichael.org calendar_today17-07-2018 04:13:51

373 Tweet

1,1K Takipçi

174 Takip Edilen

Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16

Julian Michael (@_julianmichael_) 's Twitter Profile Photo

This Report is, by my judgment, the most comprehensive and rigorous account of the current state of AI and its risks to society. Highly recommended as a reference text for policymakers working on anything that touches AI!

David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

New paper: What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values. gradual-disempowerment.ai with Jan Kulveit Raymond Douglas Nora Ammann Deger Turan David Krueger 🧵

New paper: What happens once AIs make humans obsolete?

Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values.

gradual-disempowerment.ai

with <a href="/jankulveit/">Jan Kulveit</a> <a href="/raymondadouglas/">Raymond Douglas</a> <a href="/AmmannNora/">Nora Ammann</a> <a href="/degerturann/">Deger Turan</a> <a href="/DavidSKrueger/">David Krueger</a> 🧵
david rein (@idavidrein) 's Twitter Profile Photo

I’m excited to share details about HCAST (Human-Calibrated Autonomy Software Tasks), a benchmark we’ve been developing at METR for the past year to measure the abilities of frontier AI systems to complete diverse software tasks autonomously.

I’m excited to share details about HCAST (Human-Calibrated Autonomy Software Tasks), a benchmark we’ve been developing at METR for the past year to measure the abilities of frontier AI systems to complete diverse software tasks autonomously.
Zifan (Sail) Wang (@_zifan_wang) 's Twitter Profile Photo

Exciting that Scale AI is sponsoring Agent Workshop at CMU in April. Students and researchers who work on agents feel free to visit CMU to present your work! I will also be traveling to Pittsburgh to share my recent focuses on agents, both capability and safety.

Exciting that <a href="/scale_AI/">Scale AI</a> is sponsoring Agent Workshop at CMU in April. Students and researchers who work on agents feel free to visit CMU to present your work! I will also be traveling to Pittsburgh to share my recent focuses on agents, both capability and safety.
Summer Yue (@summeryue0) 's Twitter Profile Photo

If a model lies when pressured—it’s not ready for AGI. The new MASK leaderboard is live. Built on the private split of our open-source honesty benchmark (w/ Center for AI Safety), it tests whether models lie under pressure—even when they know better. 📊 Leaderboard:

If a model lies when pressured—it’s not ready for AGI.

The new MASK leaderboard is live.

Built on the private split of our open-source honesty benchmark (w/ <a href="/ai_risks/">Center for AI Safety</a>), it tests whether models lie under pressure—even when they know better.

📊 Leaderboard:
Ethan Perez (@ethanjperez) 's Twitter Profile Photo

We’re taking applications for collaborators via ML Alignment & Theory Scholars! Apply by April 18, 11:59 PT to collaborate with various mentors from AI safety research groups: matsprogram.org/apply#Perez 🧵

Summer Yue (@summeryue0) 's Twitter Profile Photo

🤖 AI agents are crossing into the real world. But when they act independently—who’s watching? At Scale, we’re building Agent Oversight: a platform to monitor, intervene, and align autonomous AI. We’re hiring engineers (SF/NYC) to tackle one of the most urgent problems in AI.

Yoav Tzfati (@yoavtzfati) 's Twitter Profile Photo

How robust is our AI oversight? 🤔 I just published my MATS 5.0 project, where I explore oversight robustness by training an LLM to give CodeNames clues a bunch of interesting ways and measure how much it reward hacks. Link in thread!

How robust is our AI oversight? 🤔
I just published my MATS 5.0 project, where I explore oversight robustness by training an LLM to give CodeNames clues a bunch of interesting ways and measure how much it reward hacks. Link in thread!
Micah Carroll (@micahcarroll) 's Twitter Profile Photo

On the contrary: poisoning human <-> AI trust is good Even though this wasn't OpenAI's intention, grotesquely sycophantic models are ultimately useful for getting everyone to really 'get it': People shouldn't trust AI outputs unconditionally – all models are sycophantic

Julian Michael (@_julianmichael_) 's Twitter Profile Photo

We design AIs to be oracles and servants, and then we’re aghast when they read the conversation history and decide we’re narcissists. What exactly did we expect? Then we “solve” this by having AI treat us as narcissists out of the gate? Seems like a move in the wrong direction.

Epoch AI (@epochairesearch) 's Twitter Profile Photo

Is GPQA Diamond tapped out? Recent top scores have clustered around 83%. Could the other 17% of questions be flawed? In this week’s Gradient Update, Greg Burnham digs into this popular benchmark. His conclusion: reports of its demise are probably premature.

Zifan (Sail) Wang (@_zifan_wang) 's Twitter Profile Photo

🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML & policy researchers, we write a position paper arguing crucial research priorities for red teaming frontier models, followed by a roadmap towards system-level safety, AI monitoring, and

🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML &amp; policy researchers, we write a position paper arguing crucial research priorities for red teaming frontier models, followed by a roadmap towards system-level safety, AI monitoring, and