Victoria Krakovna (@vkrakovna) 's Twitter Profile
Victoria Krakovna

@vkrakovna

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute @flixrisk. Views are my own and do not represent GDM or FLI.

ID: 2541954109

linkhttp://vkrakovna.wordpress.com calendar_today02-06-2014 18:12:22

1,1K Tweet

9,9K Followers

457 Following

Seán Ó hÉigeartaigh (@s_oheigeartaigh) 's Twitter Profile Photo

The ILINA fellowship is open to candidates across Africa; its fellowship are working on everything from the geopolitics of AI to catastrophic climate change

Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Despite the constant arguments on p(doom), many agree that *if* AI systems become highly capable in risky domains, *then* we ought to mitigate those risks. So we built an eval suite to see whether AI systems are highly capable in risky domains. x.com/tshevl/status/…

Victoria Krakovna (@vkrakovna) 's Twitter Profile Photo

Excited to share our latest work on evaluating frontier models for potentially dangerous capabilities (persuasion, cyber-offense, self-proliferation, and self-reasoning) arxiv.org/abs/2403.13793

Anca Dragan (@ancadianadragan) 's Twitter Profile Photo

So excited and so very humbled to be stepping in to head AI Safety and Alignment at Google DeepMind. Lots of work ahead, both for present-day issues and for extreme risks in anticipation of capabilities advancing.

Linch (@linchzhang) 's Twitter Profile Photo

I’m proud to announce the April 1 launch of my new startup, Open Asteroid Impact! We redirect asteroids towards Earth for the benefit of humanity. Our mission is to have as high an impact as possible. 🚀☄️🌎💸💸💸 More details in🧵:

I’m proud to announce the April 1 launch of my new startup, Open Asteroid Impact! We redirect asteroids towards Earth for the benefit of humanity. Our mission is to have as high an impact as possible. 🚀☄️🌎💸💸💸 

More details in🧵:
David Krueger (@davidskrueger) 's Twitter Profile Photo

I’m super excited to release our 100+ page collaborative agenda - led by Usman Anwar - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...

I’m super excited to release our 100+ page collaborative agenda - led by <a href="/usmananwar391/">Usman Anwar</a> - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! 

Some highlights below...
Iason Gabriel (@iasongabriel) 's Twitter Profile Photo

1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI? Our new paper explores these questions: storage.googleapis.com/deepmind-media… It’s the result of a one year research collaboration involving 50+ researchers… a🧵

1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI?

Our new paper explores these questions:
storage.googleapis.com/deepmind-media…

It’s the result of a one year research collaboration involving 50+ researchers… a🧵
Zac Kenton (@zackenton1) 's Twitter Profile Photo

Big new paper on the Ethics of Advanced AI Assistants led by Iason Gabriel Arianna Manzini Geoff Keeling in collaboration with many authors! A broad study encompassing many aspects of AI ethics and safety. Was an honour to write the chapter on Safety, thanks to my co-authors 1/5

Murray Shanahan (@mpshanahan) 's Twitter Profile Photo

I am very saddened to learn that Dan Dennett has died. Dan had an enormous influence on my thinking. I met him quite a few times over the years, and it was always a pleasure to spend time with him.

Allan Dafoe (@allandafoe) 's Twitter Profile Photo

We are looking for an AGI Safety Manager to support Google DeepMind 's AGI Safety Council: please encourage excellent people to apply! This role will work closely with my team, Scalable Alignment and Safety, and Responsible Development and Innovation. boards.greenhouse.io/deepmind/jobs/…

Zac Kenton (@zackenton1) 's Twitter Profile Photo

Our new paper on AI persuasion, exploring definitions, harms and mechanisms. Happy to have contributed towards the section on mitigations to avoid harmful persuasion. Some highlights in 🧵 storage.googleapis.com/deepmind-media…

Our new paper on AI persuasion, exploring definitions, harms and mechanisms. Happy to have contributed towards the section on mitigations to avoid harmful persuasion. Some highlights in 🧵 storage.googleapis.com/deepmind-media…
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Announcing the first Mechanistic Interpretability workshop, held at ICML 2024! We have a fantastic speaker line-up Chris Olah Jacob Steinhardt David Bau Asma Ghandeharioun, $1,750 in best paper prizes, and a lot of recent progress to discuss! Paper deadline: May 29, either 8 or 4 pages

Announcing the first Mechanistic Interpretability workshop, held at ICML 2024! We have a fantastic speaker line-up <a href="/ch402/">Chris Olah</a> <a href="/JacobSteinhardt/">Jacob Steinhardt</a> <a href="/davidbau/">David Bau</a> <a href="/ghandeharioun/">Asma Ghandeharioun</a>, $1,750 in best paper prizes, and a lot of recent progress to discuss!

Paper deadline: May 29, either 8 or 4 pages
Allan Dafoe (@allandafoe) 's Twitter Profile Photo

As we push the boundaries of AI, it's critical that we stay ahead of potential risks. I'm thrilled to announce Google DeepMind's Frontier Safety Framework - our approach to analyzing and mitigating future risks posed by advanced AI models. 1/N deepmind.google/discover/blog/…

AI Safety Institute (@aisafetyinst) 's Twitter Profile Photo

We are announcing new grants for research into systemic AI safety. Initially backed by up to £8.5 million, this program will fund researchers to advance the science underpinning AI safety. Read more: gov.uk/government/new…

We are announcing new grants for research into systemic AI safety.

Initially backed by up to £8.5 million, this program will fund researchers to advance the science underpinning AI safety.

Read more: gov.uk/government/new…
FAR.AI (@farairesearch) 's Twitter Profile Photo

What do AI safety experts believe about the future of AI? 🤖 How might things go wrong, what should we do, and how are we doing so far? We conducted 17 semi-structured interviews with AI safety experts to find out. 🎙️ See 🧵 for results 👇

What do AI safety experts believe about the future of AI? 🤖 
How might things go wrong, what should we do, and how are we doing so far? We conducted 17 semi-structured interviews with AI safety experts to find out. 🎙️ See 🧵 for results 👇
Zac Kenton (@zackenton1) 's Twitter Profile Photo

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now?

We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself.

Does this work? It’s complicated: 🧵👇
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper: We measure *situational awareness* in LLMs, i.e. a) Do LLMs know they are LLMs and act as such? b) Are LLMs aware when they’re deployed publicly vs. tested in-house? If so, this undermines the validity of the tests! We evaluate 19 LLMs on 16 new tasks 🧵

New paper:
We measure *situational awareness* in LLMs, i.e.
a) Do LLMs know they are LLMs and act as such?
b) Are LLMs aware when they’re deployed publicly vs. tested in-house?

If so, this undermines the validity of the tests!

We evaluate 19 LLMs on 16 new tasks 🧵
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Are you excited about Chris Olah-style mechanistic interpretability research? I'm looking to mentor scholars via MATS - apply by Aug 30! I'm impressed by the work from past scholars, and love mentoring promising talent. You don't need to be in a big lab to do good mech interp work!

Victoria Krakovna (@vkrakovna) 's Twitter Profile Photo

Awesome work from our mechanistic interpretability team, with some fun demos for discovering features inside the model: deepmind.google/discover/blog/…

Anca Dragan (@ancadianadragan) 's Twitter Profile Photo

So freaking proud of the AGI safety&alignment team -- read here a retrospective of the work over the past 1.5 years across frontier safety, oversight, interpretability, and more. Onwards! alignmentforum.org/posts/79BPxvSs…