Dalton brown (@daltonbrown944) 's Twitter Profile
Dalton brown

@daltonbrown944

AI Alignment, AI ethics

ID: 1624063095663460352

calendar_today10-02-2023 15:10:01

56 Tweet

502 Followers

2,2K Following

Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas: * external red teaming * interagency oversight commission * internal audit committees * external incident investigation team * safety research funding (🧵below)

Russ Salakhutdinov (@rsalakhu) 's Twitter Profile Photo

“Talent hits a target no one else can hit; Genius hits a target no one else can see” A. Schopenhauer I've seen Geoffrey Hinton hit targets no one thought existed. So even though I don't fully agree with Geoff’s views on AI risks, I'd still listen carefully to what he has to say.

David Krueger (@davidskrueger) 's Twitter Profile Photo

BTW, 2 papers I often recommend: 1) "The alignment problem from a deep learning perspective" Richard Ngo et al. for research overview: arxiv.org/abs/2209.00626 2) "Natural Selection Favors AIs over Humans" Dan Hendrycks for argument for risk: arxiv.org/abs/2303.16200

Brad Neuberg (@bradneuberg) 's Twitter Profile Photo

Terence Tao, the famous mathematician, on using LLMs to aid in mathematical research: "2023-level AI can already generate suggestive hints and promising leads to a working mathematician and participate actively in the decision-making process. When integrated with tools such as

Leopold Aschenbrenner (@leopoldasch) 's Twitter Profile Photo

Intuitively, superhuman AI systems should "know" if they're acting safely. But can we "summon" such concepts from strong models with only weak supervision? Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/ x.com/OpenAI/status/…

Intuitively, superhuman AI systems should "know" if they're acting safely.

But can we "summon" such concepts from strong models with only weak supervision?

Incredibly excited to finally share what we've been working on: weak-to-strong generalization. 1/
x.com/OpenAI/status/…
Ryan Carey (@ryancareyai) 's Twitter Profile Photo

Rapid AI progress arguably places at risk everything that we value, which includes wealth amounting to 5,000 times OpenAI's ~$90B market cap, and more. To mitigate this negative externality, OpenAI will allocate ~1/10,000 of its market cap. Doesn't really cut it.

Vinodkumar Prabhakaran (@vinodkpg) 's Twitter Profile Photo

Great to be presenting this work on cross cultural and moral differences that shape perceptions of offense in language at #Neurips2023 MP2 Workshop (aipsychphil.github.io). Poster at 2:50pm CT; room 255. Work led by Aida Davani, and in collaboration w/ Mark Díaz dylan.

Zac Kenton (@zackenton1) 's Twitter Profile Photo

In our new Google DeepMind paper, we redteam methods that aim to discover latent knowledge through unsupervised learning from LLM activation data. TL;DR: Existing methods can be easily distracted by other salient features in the prompt. arxiv.org/abs/2312.10029 🧵👇

In our new <a href="/GoogleDeepMind/">Google DeepMind</a> paper, we redteam methods that aim to discover latent knowledge through unsupervised learning from LLM activation data. TL;DR: Existing methods can be easily distracted by other salient features in the prompt. arxiv.org/abs/2312.10029

🧵👇
Aleksander Madry (@aleks_madry) 's Twitter Profile Photo

So happy about this release and grateful to my awesome Preparedness team (especially Tejal Patwardhan), Policy Research, SuperAlignment and all of OpenAI for the hard work it took to get us here. It is still only a start but the work will continue!

@mrgunn ⏸️ (@mrgunn) 's Twitter Profile Photo

I'm a PhD biologist and I read OpenAI's threat preparedness assessment plan for CBRN threats. It appears to be total nonsense designed without any input from a scientist. Here's why:

Ajeya Cotra (@ajeya_cotra) 's Twitter Profile Photo

Technical researchers often don't have govt careers on their radar, but govts need technical expertise to properly regulate complex and fast-moving tech. UK AISI is an unusual opportunity to do very technical work that plugs directly into govt decisionmaking!

Sasha Luccioni, PhD 🦋🌎✨🤗 (@sashamtl) 's Twitter Profile Photo

I really hope that these recent findings about LAION will be the catalyst for changing the way we collect, curate and use datasets in AI. It's going to take more than a technological fix, we need a fundamental paradigm shift for all stages of the life cycle. A 🧵:

Evan Hubinger (@evanhub) 's Twitter Profile Photo

Following up on our recent "Sleeper Agents" paper, I'm very excited to announce that I'm leading a team at Anthropic that is explicitly tasked with trying to prove that Anthropic's alignment techniques won't work, and I'm hiring! alignmentforum.org/posts/EPDSdXr8…

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

Thoughts on the new Anthropic paper: (1) It's seminal, really useful, and a slam dunk in general. (2) But it's not the *first* evidence we have of insidious failure modes evading adversarial training in LLMs. I compiled some related work here: alignmentforum.org/posts/mFAvspg4…

Marius Mosbach (@mariusmosbach) 's Twitter Profile Photo

Excited by the recent Anthropic paper? 👉Here are some earlier papers that already provide evidence that fine-tuning isn't changing the model a lot (except for the last layers): arxiv.org/abs/2004.14448, arxiv.org/abs/2010.02616, arxiv.org/abs/2006.04884, arxiv.org/abs/2107.04734

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

Sometime in the next few months, Anthropic is expected to release a research report/paper on sparse autoencoders. Before this happens, I want to make some predictions about what it will accomplish. Overall, I think that the Anthropic SAE paper, when it comes out, will

Gary Marcus (@garymarcus) 's Twitter Profile Photo

Every new data point points in the same direction: exponential growth has stalled. It’s been 14 months and nothing that merits the name GPT-5 has emerged. How long before we acknowledge that alternative approaches are needed?

Every new data point points in the same direction: exponential growth has stalled.  It’s been 14 months and nothing that merits the name GPT-5 has emerged.

How long before we acknowledge that alternative approaches are needed?
Rosie (@rosiecampbell) 's Twitter Profile Photo

Who is thinking about the intersection of AI and epistemics? E.g. beneficial use cases (AI for forecasting, etc), the implications of AI on epistemic security, etc. Would love to chat