Gillian Hadfield (@ghadfield) 's Twitter Profile
Gillian Hadfield

@ghadfield

AI policy and alignment; integrating law, economics & computer science to build normatively competent AI that knows how to play well with humans

ID: 29931309

linkhttps://gillianhadfield.org/ calendar_today09-04-2009 05:41:50

2,2K Tweet

4,4K Followers

754 Following

Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

It’s somewhat nice getting to see who actually cared about AI harms and who was just using it as cover for other aims. 2 camps I notice:

Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

Put this is in the category of empirical results that were predicted by AI safety researchers. But, of course, the concern that AI systems would manipulate people for ulterior motives is “science fiction” and like “worrying about overpopulation on Mars.” Great work.

Atoosa Kasirzadeh (@dr_atoosa) 's Twitter Profile Photo

In this review paper, we advocate for the normalization of AI safety as an inherent component of AI development and deployment. AI safety should be a standard practice integrated into every stage of AI creation and deployment. Developing and deploying safe AI should be a

In this review paper, we advocate for the normalization of AI safety as an inherent component of AI development and deployment. AI safety should be a standard practice integrated into every stage of AI creation and deployment. Developing and deploying safe AI should be a
Gillian Hadfield (@ghadfield) 's Twitter Profile Photo

Everyone, including those who think we're building powerful AI to improve lives for everyone, should take seriously how poorly our current economic indicators (unemployment, earnings, inflation) capture the well-being of low- and moderate-income folks. politico.com/news/magazine/…

Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

The field of AI has overfit to easy to measure objectives. Once we can measure it, we can make the number go up. This would be valuable if measuring value was easy. But measuring value is hard, so AI usually underperforms when you move to real tasks.

Ethan Mollick (@emollick) 's Twitter Profile Photo

As an academic, the relative silence of the humanities and many social sciences about the future of our world with AI (besides far too many saying "AI is bad" without nuance), is a shame. These are fields with a lot to say about the nature of being human, silent at a key moment.

Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Early signs of deception, cheating & self-preservation in top-performing models in terms of reasoning are extremely worrisome. We don't know how to guarantee AI won't have undesired behavior to reach goals & this must be addressed before deploying powerful autonomous agents.

Cooperative AI Foundation (@coop_ai) 's Twitter Profile Photo

The development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at CAIF and a host of leading researchers explores the novel and under-appreciated risks these systems pose. Details below.

The development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at CAIF and a host of leading researchers explores the novel and under-appreciated risks these systems pose. Details below.
Jack Clark (@jackclarksf) 's Twitter Profile Photo

If you want to build and deploy powerful AI systems you need to evaluate them for capabilities and potential national security risks. Recently, governments have stood up orgs for companies to work with on the natsec part of this and these have been extraordinarily helpful.

Séb Krier (@sebkrier) 's Twitter Profile Photo

One of the most underrated areas of AI governance is cooperative AI research. Alignment is important but may be insufficient for good outcomes. Using AI to help solve cooperation problems seems very important to me. See these excerpts from Allan Dafoe's chat with Rob Wiblin.

One of the most underrated areas of AI governance is cooperative AI research. Alignment is important but may be insufficient for good outcomes. Using AI to help solve cooperation problems seems very important to me. See these excerpts from <a href="/AllanDafoe/">Allan Dafoe</a>'s chat with <a href="/robertwiblin/">Rob Wiblin</a>.
Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

If you pretend that xrisk from ASI misalignment is some novel, incredibly complex failure mode (instead of a simple extrapolation of established theories of incentive design), you blind people to the evidence for, and predictive power of, the theories that motivate the risk.

John Arnold (@johnarnoldfndtn) 's Twitter Profile Photo

We Arnold Ventures funded a pilot to bring a Nordic-style restorative justice model to a prison in PA and assess its impact. The question was whether it could work within a vastly different criminal justice system. Initial results are so promising that PA is expanding the

Dylan HadfieldMenell (@dhadfieldmenell) 's Twitter Profile Photo

Because it is a bad idea to assume your validator has no bugs. Any approach that assumes a perfect validator is doomed to fail except in certain narrow applications. Most AI approaches implicitly or explicitly assume a perfect validator.

Gillian Hadfield (@ghadfield) 's Twitter Profile Photo

This is a really important result for a lot of people working in alignment — the assumption we can prompt or rely on in-context learning to reliably reflect specific values is pretty widespread.

Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Very relevant piece by Kevin Roose in The New York Times, 3 points that particularly resonate with me: 1⃣ AGI's arrival raises major economic, political and technological questions to which we currently have no answers. 2⃣ If we're in denial (or simply not paying attention), we could

Gillian Hadfield (@ghadfield) 's Twitter Profile Photo

I avoid politics here but this is just so morally outrageous: a black man awarded the Medal of Honor in 1970 by Richard Nixon for his brave service in Vietnam has his page scrubbed by the Department of Defense with "deimedal" inserted in the URL. theguardian.com/us-news/2025/m…

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

At a recent Princeton University panel I was asked if the crisis created by AI is also an opportunity for fundamental changes to higher ed. Yes! I’ve been thinking and writing about this since before ChatGPT’s release. I see two big opportunities. The first is to separate