Max Nadeau (@maxnadeau_) Twitter Tweets • TwiCopy

Max Nadeau

@maxnadeau_

+ Follow

Advancing AI honesty, control, safety at @open_phil. Prev Harvard AISST (haist.ai), Harvard '23.

ID: 935718892546220034

linkhttp://maxnadeau.com calendar_today29-11-2017 03:55:43

301 Tweet

1,1K Followers

485 Following

Daniel Paleka

@dpaleka

8 months ago

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

thumb_up_off_alt3,3K

chat_bubble_outline36

repeat269

shareShare

Alexander Berger

@albrgr

7 months ago

Hmm maybe we should have just been funding this guy x.com/albrgr/status/…

thumb_up_off_alt190

chat_bubble_outline5

repeat8

shareShare

Arvind Narayanan

@random_walker

7 months ago

Like many of you I've been frustrated by how social media incentivizes and amplifies the worst kind of discourse. I've instead been seeking out spaces for discussion in which participants * trust each other * resist the temptation to assume that the other side is misinformed or

thumb_up_off_alt147

chat_bubble_outline3

repeat27

shareShare

Max Nadeau

@maxnadeau_

7 months ago

We really are in a moment of perplexity, aren't we

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

Max Nadeau

@maxnadeau_

7 months ago

Good thread! I think this sort of behavior from Claude is straightforwardly inappropriate/misaligned/undesirable—not how an LLM agent ought to act.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Max Nadeau

@maxnadeau_

7 months ago

Wild stuff. And as usual, remember that this is the least rich and internally-detailed that these worlds will ever be!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Max Nadeau

@maxnadeau_

6 months ago

Weirdly underrated research direction. We need automatic methods for surfacing realistic inputs that trigger unacceptable LLM behaviors, but almost all the research effort goes to finding jailbreaks. Glad Transluce is paving the way!

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

Max Nadeau

@maxnadeau_

6 months ago

My view are similar.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Max Nadeau

@maxnadeau_

6 months ago

This is such a fun piece of performance art. For those who haven't seen, the agents are planning a party/performance (tonight, in SF). If I didn't have preexisting evening plans I'd definitely go.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Max Nadeau

@maxnadeau_

6 months ago

Really interesting thread, contrary to my assumptions about scale. Thanks for putting it together Naomi Saphra!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

1a3orn

@1a3orn

6 months ago

Reliable sources have told me that after you start work at Anthropic, they give you a spiral-bound notebook, and tell you: "To assist your work, this is your SECRET SCRATCHPAD. No one else will see the contents of your SECRET SCRATCHPAD, so you can use it freely as you wish -

thumb_up_off_alt437

chat_bubble_outline2

repeat22

shareShare

Brendan Falk

@brendanfalk

6 months ago

1) It takes *way* longer than anticipated to actually build/deploy custom AI agents for large enterprises. AI makes the engineering fast. But sales, product, system integration, and implementation are *incredibly* slow. Customers don't know what they want, getting stakeholders

thumb_up_off_alt560

chat_bubble_outline28

repeat33

shareShare

j⧉nus

@repligate

5 months ago

This paper is interesting from the perspective of metascience, because it's a serious attempt to empirically study why LLMs behave in certain ways and differently from each other. A serious attempt attacks all exposed surfaces from all angles instead of being attached to some

thumb_up_off_alt95

chat_bubble_outline3

repeat11

shareShare

Max Nadeau

@maxnadeau_

5 months ago

* I find this deflationary explanation (learning effects after 40 hours of agent usage) intuitively plausible, probably the best alternative to METR's primary explanation. I'm very grateful to Emmett for reading the paper closely and bringing it up; seems like a valuable

thumb_up_off_alt86

chat_bubble_outline4

repeat4

shareShare

Xander Davies

@alxndrdavies

5 months ago

We at AI Security Institute worked with OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

We at <a href="/AISecurityInst/">AI Security Institute</a> worked with <a href="/OpenAI/">OpenAI</a> to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

thumb_up_off_alt135

chat_bubble_outline3

repeat24

shareShare

Max Nadeau

@maxnadeau_

5 months ago

Great prompt; what work will we be saying this about in 4 years? Some of my guesses at the link below, but more importantly, this is a much way to pick what you work on than just reacting to the latest event/hot argument in the literature openphilanthropy.org/request-for-pr…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare