Jon Richens (@jonathanrichens) 's Twitter Profile
Jon Richens

@jonathanrichens

Research scientist in AI safety @DeepMind

ID: 1293230988924014595

calendar_today11-08-2020 17:01:20

99 Tweet

473 Followers

228 Following

Logan Graham (@logangraham) 's Twitter Profile Photo

Iโ€™m hiring ambitious Research Scientists at Anthropic to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:

Joshua Loftus (@joftius) 's Twitter Profile Photo

This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence... ๐—ง๐—ต๐—ฒ ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐—ฅ๐—ฒ๐˜ƒ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—ก๐—ฒ๐—ฒ๐—ฑ๐˜€ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐—ณ๐—ถ๐—ฐ ๐—ฃ๐—ฟ๐—ฎ๐—ด๐—บ๐—ฎ๐˜๐—ถ๐˜€๐—บ Full text: arxiv.org/abs/2406.02275

This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence...

๐—ง๐—ต๐—ฒ ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐—ฅ๐—ฒ๐˜ƒ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—ก๐—ฒ๐—ฒ๐—ฑ๐˜€ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐—ณ๐—ถ๐—ฐ ๐—ฃ๐—ฟ๐—ฎ๐—ด๐—บ๐—ฎ๐˜๐—ถ๐˜€๐—บ

Full text: arxiv.org/abs/2406.02275
Richard Ngo (@richardmcngo) 's Twitter Profile Photo

If I talk to one more person who says โ€œbut even if this research direction led to a massive breakthrough in our scientific understanding of neural networks/deep learning/agent foundations, how would that help with AI safety?โ€ I will become the joker.

Tom Everitt (@tom4everitt) 's Twitter Profile Photo

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* ๐Ÿงต

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?

In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* ๐Ÿงต
Alexis Bellot (@alexis_bellot_) 's Twitter Profile Photo

Can we trust a black-box system, when all we know is its past behaviour? ๐Ÿค–๐Ÿค” In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. ๐Ÿงต

Can we trust a black-box system, when all we know is its past behaviour? ๐Ÿค–๐Ÿค”
In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. ๐Ÿงต
Richard Suwandi @ICLR2025 (@richardcsuwandi) 's Twitter Profile Photo

2 years ago, Ilya Sutskever made a bold prediction that large neural networks are learning world models through text. Recently, a new paper by Google DeepMind provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must

2 years ago, <a href="/ilyasut/">Ilya Sutskever</a> made a bold prediction that large neural networks are learning world models through text.

Recently, a new paper by <a href="/GoogleDeepMind/">Google DeepMind</a> provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must
Mario Giulianelli (@glnmario) 's Twitter Profile Photo

I will be a SPAR mentor this Fall๐Ÿค– Check out the programme and apply by 20 August to work with me on formalising and/or measuring and/or intervening on goal-directed behaviour in AI agents More info on potential projects here ๐Ÿงต