Lewis Ho (@_lewisho) 's Twitter Profile
Lewis Ho

@_lewisho

Research Scientist at Google DeepMind

ID: 877336277330165760

calendar_today21-06-2017 01:24:03

38 Tweet

232 Takipçi

165 Takip Edilen

METR (@metr_evals) 's Twitter Profile Photo

We're excited to share a proposal for evals-based catastrophic risk reduction that AI developers can adopt today: Responsible Scaling Policies (RSPs) that establish conditions under which it would be unsafe to continue advancing AI capabilities without additional safety measures.

We're excited to share a proposal for evals-based catastrophic risk reduction that AI developers can adopt today: Responsible Scaling Policies (RSPs) that establish conditions under which it would be unsafe to continue advancing AI capabilities without additional safety measures.
harry law (hopfield network truther) (@lawhsw) 's Twitter Profile Photo

1/9 Amidst lots of discussion about what an appropriate international governance regime for AI might look like, Lewis Ho and I wrote for @nature about whether an organisation with a ‘dual mandate’ to manage risk and spread benefits could be a promising model to explore

1/9 Amidst lots of discussion about what an appropriate international governance regime for AI might look like, <a href="/_lewisho/">Lewis Ho</a> and I wrote for @nature about whether an organisation with a ‘dual mandate’ to manage risk and spread benefits could be a promising model to explore
Zach Freitas-Groff 🔸 (@zdgroff) 's Twitter Profile Photo

📈Job market paper time📉 I’m excited to finally share my job market paper! My JMP studies whether and why policy choices are stubbornly persistent. For example, Oregon has an income tax, but Washington doesn’t—seemingly because of nearly century-old choices. Is this typical?

📈Job market paper time📉

I’m excited to finally share my job market paper! My JMP studies whether and why policy choices are stubbornly persistent.

For example, Oregon has an income tax, but Washington doesn’t—seemingly because of nearly century-old choices. Is this typical?
Toby (@tshevl) 's Twitter Profile Photo

In 2024, the AI community will develop more capable AI systems than ever before. How do we know what new risks to protect against, and what the stakes are? Our research team at Google DeepMind built a set of evaluations to measure potentially dangerous capabilities: 🧵

In 2024, the AI community will develop more capable AI systems than ever before. How do we know what new risks to protect against, and what the stakes are?

Our research team at <a href="/GoogleDeepMind/">Google DeepMind</a> built a set of evaluations to measure potentially dangerous capabilities: 🧵
Lewis Ho (@_lewisho) 's Twitter Profile Photo

GDM's 1st step towards the ambitious ideals of responsible scaling, these being: identifying AI capabilities that pose severe risk, using evals to detect such capabilities, preparing and articulating mitigations plans, and involving externals in the process as appropriate.

Sarah Cogan (@sarah_cogan) 's Twitter Profile Photo

Curious about how we evaluate dangerous capabilities at Google DeepMind? 🤔 The Frontier Safety team just open-sourced resources for our in-house CTF & self-proliferation challenges! Check it out: github.com/google-deepmin…

Allan Dafoe (@allandafoe) 's Twitter Profile Photo

We are hiring! Google DeepMind's Frontier Safety and Governance team is dedicated to mitigating frontier AI risks; we work closely with technical safety, policy, responsibility, security, and GDM leadership. Please encourage great people to apply! 1/ boards.greenhouse.io/deepmind/jobs/…

Séb Krier (@sebkrier) 's Twitter Profile Photo

Are you tired of reading bad Twitter takes on AGI governance? Do you want to work on some of the most exciting and thorny questions relating to AGI safety and governance? Then you should apply for this Research Scientist position with the Frontier Safety & Governance team ASAP.

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

We thought it would be helpful to have all of the similar themes/components from each of Deepmind's Frontier Safety Framework, OpenAI's Preparedness Framework, and Anthropic's Responsible Scaling Policy, in one place.

David Lindner (@davlindner) 's Twitter Profile Photo

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵
Allan Dafoe (@allandafoe) 's Twitter Profile Photo

I'm proud of GoogleDeepMind/Google's v2 update to our Frontier Safety Framework. We were the first major tech company to produce an explicit risk management framework for extreme risks, and I'm glad we are continuing to push ahead on safety best practice. deepmind.google/discover/blog/…

Victoria Krakovna (@vkrakovna) 's Twitter Profile Photo

We are excited to release a short course on AGI safety! The course offers a concise and accessible introduction to AI alignment problems and our technical & governance approaches, consisting of short recorded talks and exercises (75 minutes total). deepmindsafetyresearch.medium.com/1072adb7912c

Allan Dafoe (@allandafoe) 's Twitter Profile Photo

Thanks Rob for a great conversation about important topics: why technology drives history, and the rare opportunity of steering it.

Rohin Shah (@rohinmshah) 's Twitter Profile Photo

We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.

We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.
Atul Gawande (@atul_gawande) 's Twitter Profile Photo

Yesterday, Rubio terminated 5800 USAID contracts – more than 90% of its foreign aid programs – in defiance of the courts. Here’s a list of just some of the lifesaving awards that were terminated. Nearly all were Congressional mandated. They've saved millions of lives. 🧵

Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

Just released GDM’s 100+ page approach to AGI safety &amp; security! (Don’t worry, there’s a 10 page summary.)

AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
Lewis Ho (@_lewisho) 's Twitter Profile Photo

We have updated the Gemini 2.5 Pro model card with results from our FSF evaluations. These continue to be critical for helping us understand how to keep our systems safe amidst the dizzyingly impressive capability improvements.