Jon Richens (@jonathanrichens) 's Twitter Profile
Jon Richens

@jonathanrichens

Research scientist in AI safety @DeepMind

ID: 1293230988924014595

calendar_today11-08-2020 17:01:20

99 Tweet

473 Takipçi

228 Takip Edilen

Logan Graham (@logangraham) 's Twitter Profile Photo

I’m hiring ambitious Research Scientists at Anthropic to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:

Joshua Loftus (@joftius) 's Twitter Profile Photo

This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence... 𝗧𝗵𝗲 𝗖𝗮𝘂𝘀𝗮𝗹 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗡𝗲𝗲𝗱𝘀 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝗳𝗶𝗰 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝘀𝗺 Full text: arxiv.org/abs/2406.02275

This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence...

𝗧𝗵𝗲 𝗖𝗮𝘂𝘀𝗮𝗹 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗡𝗲𝗲𝗱𝘀 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝗳𝗶𝗰 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝘀𝗺

Full text: arxiv.org/abs/2406.02275
Richard Ngo (@richardmcngo) 's Twitter Profile Photo

If I talk to one more person who says “but even if this research direction led to a massive breakthrough in our scientific understanding of neural networks/deep learning/agent foundations, how would that help with AI safety?” I will become the joker.

Tom Everitt (@tom4everitt) 's Twitter Profile Photo

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?

In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵
Alexis Bellot (@alexis_bellot_) 's Twitter Profile Photo

Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔 In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵

Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔
In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵
Richard Suwandi @ICLR2025 (@richardcsuwandi) 's Twitter Profile Photo

2 years ago, Ilya Sutskever made a bold prediction that large neural networks are learning world models through text. Recently, a new paper by Google DeepMind provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must

2 years ago, <a href="/ilyasut/">Ilya Sutskever</a> made a bold prediction that large neural networks are learning world models through text.

Recently, a new paper by <a href="/GoogleDeepMind/">Google DeepMind</a> provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must
Mario Giulianelli (@glnmario) 's Twitter Profile Photo

I will be a SPAR mentor this Fall🤖 Check out the programme and apply by 20 August to work with me on formalising and/or measuring and/or intervening on goal-directed behaviour in AI agents More info on potential projects here 🧵