Neel Nanda (@neelnanda5) Twitter Tweets • TwiCopy

Neel Nanda

@neelnanda5

+ Follow

Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

ID: 1542528075128348674

linkhttp://neelnanda.io calendar_today30-06-2022 15:18:58

4,4K Tweet

25,25K Takipçi

117 Takip Edilen

Neel Nanda

@neelnanda5

5 months ago

The call for papers for the NeurIPS Mechanistic Interpretability Workshop is open! Max 4 or 9 pages, due 22 Aug, NeurIPS submissions welcome We welcome any works that further our ability to use the internals of a model to better understand it Details: mechinterpworkshop com

thumb_up_off_alt221

chat_bubble_outline2

repeat29

shareShare

Samuel Marks

@saprmarks

5 months ago

xAI launched Grok 4 without any documentation of their safety testing. This is reckless and breaks with industry best practices followed by other major AI labs. If xAI is going to be a frontier AI developer, they should act like one. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline265

repeat237

shareShare

Samuel Marks

@saprmarks

5 months ago

I'm glad xAI (apparently) ran some evals. But also, well...

thumb_up_off_alt219

chat_bubble_outline2

repeat7

shareShare

Mikita Balesni 🇺🇦

@balesni

5 months ago

Also, in practice, current reasoning AIs really seem to want to say their reasoning out loud. Even if you tell them not to, they often can’t help themselves and will blab anyway.

thumb_up_off_alt58

chat_bubble_outline1

repeat2

shareShare

Rohin Shah

@rohinmshah

5 months ago

Chain of thought monitoring looks valuable enough that we’ve put it in our Frontier Safety Framework to address deceptive alignment. This paper is a good explanation of why we’re optimistic – but also why it may be fragile, and what to do to preserve it. x.com/balesni/status…

thumb_up_off_alt71

chat_bubble_outline1

repeat6

shareShare

Neel Nanda

@neelnanda5

5 months ago

It was great to be part of this statement. I wholeheartedly agree. It is a wild lucky coincidence that models often express dangerous intentions aloud, and it would be foolish to waste this opportunity. It is crucial to keep chain of thought monitorable as long as possible

thumb_up_off_alt117

chat_bubble_outline4

repeat9

shareShare

Daniel Kokotajlo

@dkokotajlo

5 months ago

I'm very happy to see this happen. I think that we're in a vastly better position to solve the alignment problem if we can see what our AIs are thinking, and I think that we sorta mostly can right now, but that by default in the future companies will move away from this paradigm

thumb_up_off_alt175

chat_bubble_outline7

repeat12

shareShare

Neel Nanda

@neelnanda5

5 months ago

If you want to write about AI, this is a great opportunity - I'm impressed by Asterisk's work, and based on chats with the organisers, the plans here are well thought through AI is reshaping the world but oft misunderstood. I want more good and high fidelity public writing on it

thumb_up_off_alt48

chat_bubble_outline0

repeat1

shareShare