Jon Richens (@jonathanrichens) Twitter Tweets • TwiCopy

Jon Richens

@jonathanrichens

+ Follow

Research scientist in AI safety @DeepMind

ID: 1293230988924014595

calendar_today11-08-2020 17:01:20

99 Tweet

473 Followers

228 Following

Logan Graham

@logangraham

2 years ago

I’m hiring ambitious Research Scientists at Anthropic to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:

thumb_up_off_alt724

chat_bubble_outline12

repeat72

shareShare

Joshua Loftus

@joftius

2 years ago

This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence... 𝗧𝗵𝗲 𝗖𝗮𝘂𝘀𝗮𝗹 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗡𝗲𝗲𝗱𝘀 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝗳𝗶𝗰 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝘀𝗺 Full text: arxiv.org/abs/2406.02275

thumb_up_off_alt26

chat_bubble_outline2

repeat6

shareShare

Richard Ngo

@richardmcngo

a year ago

If I talk to one more person who says “but even if this research direction led to a massive breakthrough in our scientific understanding of neural networks/deep learning/agent foundations, how would that help with AI safety?” I will become the joker.

thumb_up_off_alt354

chat_bubble_outline23

repeat15

shareShare

Tom Everitt

@tom4everitt

8 months ago

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

thumb_up_off_alt237

chat_bubble_outline23

repeat44

shareShare

Alexis Bellot

@alexis_bellot_

6 months ago

Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔 In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵

thumb_up_off_alt125

chat_bubble_outline5

repeat25

shareShare

Richard Suwandi @ICLR2025

@richardcsuwandi

6 months ago

2 years ago, Ilya Sutskever made a bold prediction that large neural networks are learning world models through text. Recently, a new paper by Google DeepMind provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must

2 years ago, <a href="/ilyasut/">Ilya Sutskever</a> made a bold prediction that large neural networks are learning world models through text.

Recently, a new paper by <a href="/GoogleDeepMind/">Google DeepMind</a> provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat116

shareShare

Mario Giulianelli

@glnmario

4 months ago

I will be a SPAR mentor this Fall🤖 Check out the programme and apply by 20 August to work with me on formalising and/or measuring and/or intervening on goal-directed behaviour in AI agents More info on potential projects here 🧵

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare