Dalton brown (@daltonbrown944) Twitter Tweets • TwiCopy

Dalton brown

@daltonbrown944

+ Follow

AI Alignment, AI ethics

ID: 1624063095663460352

calendar_today10-02-2023 15:10:01

56 Tweet

502 Followers

2,2K Following

Dan Hendrycks

a year ago

Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas: * external red teaming * interagency oversight commission * internal audit committees * external incident investigation team * safety research funding (🧵below)

thumb_up_off_alt219

chat_bubble_outline9

Russ Salakhutdinov

a year ago

“Talent hits a target no one else can hit; Genius hits a target no one else can see” A. Schopenhauer I've seen Geoffrey Hinton hit targets no one thought existed. So even though I don't fully agree with Geoff’s views on AI risks, I'd still listen carefully to what he has to say.

thumb_up_off_alt262

chat_bubble_outline6

David Krueger

a year ago

BTW, 2 papers I often recommend: 1) "The alignment problem from a deep learning perspective" Richard Ngo et al. for research overview: arxiv.org/abs/2209.00626 2) "Natural Selection Favors AIs over Humans" Dan Hendrycks for argument for risk: arxiv.org/abs/2303.16200

thumb_up_off_alt46

chat_bubble_outline1

Brad Neuberg

9 months ago

Terence Tao, the famous mathematician, on using LLMs to aid in mathematical research: "2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process. When integrated with tools such as

thumb_up_off_alt1,1K

chat_bubble_outline23

Leopold Aschenbrenner

9 months ago

Intuitively, superhuman AI systems should "know" if they're acting safely. But can we "summon" such concepts from strong models with only weak supervision? Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/ x.com/OpenAI/status/…

Intuitively, superhuman AI systems should "know" if they're acting safely.

But can we "summon" such concepts from strong models with only weak supervision?

Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/
x.com/OpenAI/status/…

thumb_up_off_alt479

chat_bubble_outline9

Ryan Carey

9 months ago

Rapid AI progress arguably places at risk everything that we value, which includes wealth amounting to 5,000 times OpenAI's ~$90B market cap, and more. To mitigate this negative externality, OpenAI will allocate ~1/10,000 of its market cap. Doesn't really cut it.

thumb_up_off_alt82

chat_bubble_outline10

Vinodkumar Prabhakaran

9 months ago

Great to be presenting this work on cross cultural and moral differences that shape perceptions of offense in language at #Neurips2023 MP2 Workshop (aipsychphil.github.io). Poster at 2:50pm CT; room 255. Work led by Aida Davani, and in collaboration w/ Mark Díaz dylan.

thumb_up_off_alt33

chat_bubble_outline1

Zac Kenton

9 months ago

In our new Google DeepMind paper, we redteam methods that aim to discover latent knowledge through unsupervised learning from LLM activation data. TL;DR: Existing methods can be easily distracted by other salient features in the prompt. arxiv.org/abs/2312.10029 🧵👇

In our new <a href="/GoogleDeepMind/">Google DeepMind</a> paper, we redteam methods that aim to discover latent knowledge through unsupervised learning from LLM activation data. TL;DR: Existing methods can be easily distracted by other salient features in the prompt. arxiv.org/abs/2312.10029

🧵👇

thumb_up_off_alt233

chat_bubble_outline5

Aleksander Madry

9 months ago

So happy about this release and grateful to my awesome Preparedness team (especially Tejal Patwardhan), Policy Research, SuperAlignment and all of OpenAI for the hard work it took to get us here. It is still only a start but the work will continue!

thumb_up_off_alt94

chat_bubble_outline4

@mrgunn ⏸️

9 months ago

I'm a PhD biologist and I read OpenAI's threat preparedness assessment plan for CBRN threats. It appears to be total nonsense designed without any input from a scientist. Here's why:

thumb_up_off_alt143

chat_bubble_outline9

Ajeya Cotra

9 months ago

Technical researchers often don't have govt careers on their radar, but govts need technical expertise to properly regulate complex and fast-moving tech. UK AISI is an unusual opportunity to do very technical work that plugs directly into govt decisionmaking!

thumb_up_off_alt54

chat_bubble_outline2

tess

9 months ago

when I grow up I want to be a rational actor with perfect information

thumb_up_off_alt1,1K

chat_bubble_outline18

Sasha Luccioni, PhD 🦋🌎✨🤗

9 months ago

I really hope that these recent findings about LAION will be the catalyst for changing the way we collect, curate and use datasets in AI. It's going to take more than a technological fix, we need a fundamental paradigm shift for all stages of the life cycle. A 🧵:

thumb_up_off_alt357

chat_bubble_outline14

Evan Hubinger

8 months ago

Following up on our recent "Sleeper Agents" paper, I'm very excited to announce that I'm leading a team at Anthropic that is explicitly tasked with trying to prove that Anthropic's alignment techniques won't work, and I'm hiring! alignmentforum.org/posts/EPDSdXr8…

thumb_up_off_alt490

chat_bubble_outline14

Cas (Stephen Casper)

@stephenlcasper

8 months ago

Thoughts on the new Anthropic paper: (1) It's seminal, really useful, and a slam dunk in general. (2) But it's not the *first* evidence we have of insidious failure modes evading adversarial training in LLMs. I compiled some related work here: alignmentforum.org/posts/mFAvspg4…

thumb_up_off_alt106

chat_bubble_outline0

Marius Mosbach

8 months ago

Excited by the recent Anthropic paper? 👉Here are some earlier papers that already provide evidence that fine-tuning isn't changing the model a lot (except for the last layers): arxiv.org/abs/2004.14448, arxiv.org/abs/2010.02616, arxiv.org/abs/2006.04884, arxiv.org/abs/2107.04734

thumb_up_off_alt168

chat_bubble_outline2

Cas (Stephen Casper)

@stephenlcasper

5 months ago

Sometime in the next few months, Anthropic is expected to release a research report/paper on sparse autoencoders. Before this happens, I want to make some predictions about what it will accomplish. Overall, I think that the Anthropic SAE paper, when it comes out, will

thumb_up_off_alt290

chat_bubble_outline7

Gary Marcus

5 months ago

Every new data point points in the same direction: exponential growth has stalled. It’s been 14 months and nothing that merits the name GPT-5 has emerged. How long before we acknowledge that alternative approaches are needed?

Every new data point points in the same direction: exponential growth has stalled. It’s been 14 months and nothing that merits the name GPT-5 has emerged.

How long before we acknowledge that alternative approaches are needed?

thumb_up_off_alt267

chat_bubble_outline65

Rosie

5 months ago

Who is thinking about the intersection of AI and epistemics? E.g. beneficial use cases (AI for forecasting, etc), the implications of AI on epistemic security, etc. Would love to chat

thumb_up_off_alt83

chat_bubble_outline36

Bobby Allyn

4 months ago

Statement from Scarlett Johansson on the OpenAI situation. Wow:

Statement from Scarlett Johansson on the OpenAI situation. Wow:

thumb_up_off_alt85,85K

chat_bubble_outline1,1K