Max Nadeau (@maxnadeau_) 's Twitter Profile
Max Nadeau

@maxnadeau_

Advancing AI honesty, control, safety at @open_phil. Prev Harvard AISST (haist.ai), Harvard '23.

ID: 935718892546220034

linkhttp://maxnadeau.com calendar_today29-11-2017 03:55:43

301 Tweet

1,1K Followers

485 Following

Daniel Paleka (@dpaleka) 's Twitter Profile Photo

3.7 sonnet: *hands behind back* yes the tests do pass. why do you ask. what did you hear 4o: yes you are Jesus Christ's brother. now go. Nanjing awaits o3: Listen, sorry, I owe you a straight explanation. This was once revealed to me in a dream

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

Like many of you I've been frustrated by how social media incentivizes and amplifies the worst kind of discourse. I've instead been seeking out spaces for discussion in which participants * trust each other * resist the temptation to assume that the other side is misinformed or

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

Good thread! I think this sort of behavior from Claude is straightforwardly inappropriate/misaligned/undesirable—not how an LLM agent ought to act.

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

Weirdly underrated research direction. We need automatic methods for surfacing realistic inputs that trigger unacceptable LLM behaviors, but almost all the research effort goes to finding jailbreaks. Glad Transluce is paving the way!

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

This is such a fun piece of performance art. For those who haven't seen, the agents are planning a party/performance (tonight, in SF). If I didn't have preexisting evening plans I'd definitely go.

1a3orn (@1a3orn) 's Twitter Profile Photo

Reliable sources have told me that after you start work at Anthropic, they give you a spiral-bound notebook, and tell you: "To assist your work, this is your SECRET SCRATCHPAD. No one else will see the contents of your SECRET SCRATCHPAD, so you can use it freely as you wish -

Brendan Falk (@brendanfalk) 's Twitter Profile Photo

1) It takes *way* longer than anticipated to actually build/deploy custom AI agents for large enterprises. AI makes the engineering fast. But sales, product, system integration, and implementation are *incredibly* slow. Customers don't know what they want, getting stakeholders

j⧉nus (@repligate) 's Twitter Profile Photo

This paper is interesting from the perspective of metascience, because it's a serious attempt to empirically study why LLMs behave in certain ways and differently from each other. A serious attempt attacks all exposed surfaces from all angles instead of being attached to some

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

* I find this deflationary explanation (learning effects after 40 hours of agent usage) intuitively plausible, probably the best alternative to METR's primary explanation. I'm very grateful to Emmett for reading the paper closely and bringing it up; seems like a valuable

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

Great prompt; what work will we be saying this about in 4 years? Some of my guesses at the link below, but more importantly, this is a much way to pick what you work on than just reacting to the latest event/hot argument in the literature openphilanthropy.org/request-for-pr…