Ajeya Cotra (@ajeya_cotra) Twitter Tweets • TwiCopy

Paul Graham

7 months ago

It seems to me that AGI would mean the end of prompt engineering. Moderately intelligent humans can figure out what you want without elaborate prompts. So by definition so would AGI.

thumb_up_off_alt6,6K

chat_bubble_outline630

repeat318

shareShare

The key question is whether you can find improvements which work at large scale using mostly small experiments, not whether the improvements work just as well at small scale. The Transformer, MoE, and MQA were all originally found at tiny scale (~1 hr on an H100). 🧵

thumb_up_off_alt163

chat_bubble_outline8

repeat6

shareShare

Sam Bowman

@sleepinyourhat

7 months ago

🧵✨🙏 With the new Claude Opus 4, we conducted what I think is by far the most thorough pre-launch alignment assessment to date, aimed at understanding its values, goals, and propensities. Preparing it was a wild ride. Here’s some of what we learned. 🙏✨🧵

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat157

shareShare

Ryan Greenblatt

@ryanpgreenblatt

7 months ago

Jesse Mu At the point when Claude n can build Claude n+1, I do not think the biggest takeaway will be that humans get to go home and knit sweaters.

thumb_up_off_alt188

chat_bubble_outline2

repeat8

shareShare

Eric Neyman

@ericneyman

7 months ago

In the excellent Asterisk discussion between Ajeya Cotra and Arvind Narayanan (Arvind Narayanan), Arvind says LLMs are bad at figuring out why you can beat them at rock-paper-scissors by revealing your move after the LLM reveals its move. How does o3 do on that? 🧵

In the excellent <a href="/asteriskmgzn/">Asterisk</a> discussion between <a href="/ajeya_cotra/">Ajeya Cotra</a> and <a href="/random_walker/">Arvind Narayanan</a> (Arvind Narayanan), Arvind says LLMs are bad at figuring out why you can beat them at rock-paper-scissors by revealing your move after the LLM reveals its move. How does o3 do on that? 🧵

thumb_up_off_alt47

chat_bubble_outline3

repeat4

shareShare

Ajeya Cotra

@ajeya_cotra

6 months ago

Exciting to see new work Open Philanthropy supported through the agent benchmarks RFP I led last year!

thumb_up_off_alt23

chat_bubble_outline0

repeat3

shareShare

Konstantin

@konstantinpilz

6 months ago

People keep asking me ‘Konstantin, where are all the data centers?’ Today, I can finally give you the answer. Explore our new dataset of 750 AI supercomputers, both those that already exist and those planned over the next five years. Some of my own analysis 🧵

thumb_up_off_alt971

chat_bubble_outline20

repeat150

shareShare

Ryan Greenblatt

@ryanpgreenblatt

6 months ago

This paper doesn't show fundamental limitations of LLMs: - The "higher complexity" problems require more reasoning than fits in the context length (humans would also take too long). - Humans would also make errors in the cases where the problem is doable in the context length. -

thumb_up_off_alt558

chat_bubble_outline24

repeat52

shareShare

AI Digest

@aidigest_

6 months ago

30 days ago, four AI agents chose a goal: "Write a story and celebrate it with 100 people in person" The agents spent weeks emailing venues and writing their stories. Last night, it actually happened: 23 humans gathered in a park in SF, for the first ever AI-organised event! 🧵

thumb_up_off_alt416

chat_bubble_outline16

repeat43

shareShare

Luca Righetti 🔸

@lucafrighetti

5 months ago

How concerned should we be about AIxBio? We surveyed 46 bio experts and 22 superforecasters: If LLMs do very well on a virology eval, human-caused epidemics could increase 2-5x. Most thought this was >5yrs away. In fact, the threshold was hit just *months* after the survey. 🧵

thumb_up_off_alt125

chat_bubble_outline1

repeat32

shareShare

Ajeya Cotra

@ajeya_cotra

5 months ago

Congrats to Nate and Joel and others on the first high-quality AI uplift (er downlift) RCT on coding AFAIK. Was really fun following this strange result behind the scenes and very excited it's finally out and I get to talk about it! TBH I don't know what to make of it still.

thumb_up_off_alt109

chat_bubble_outline5

repeat0

shareShare

Matt Clancy

@mattsclancy

5 months ago

This reminds me of this paper, which gave randomized access to GPT-4/4o for checking the reproducibility of economics papers. Teams with LLM access took longer to assess computational reproducibility (but not statistically significant). econstor.eu/bitstream/1041…

thumb_up_off_alt16

chat_bubble_outline4

repeat5

shareShare

Simon Smith

@_simonsmith

5 months ago

Ajeya Cotra I think the study by Ethan Mollick et al from 2023 gives hints of what's going on. When AI is better than you at a task, it raises your performance. The worse you are, the more it improves you. But if you use it in domains it's not better than you, performance can worsen. METR's

<a href="/ajeya_cotra/">Ajeya Cotra</a> I think the study by <a href="/emollick/">Ethan Mollick</a> et al from 2023 gives hints of what's going on. When AI is better than you at a task, it raises your performance. The worse you are, the more it improves you. But if you use it in domains it's not better than you, performance can worsen. METR's

thumb_up_off_alt14

chat_bubble_outline2

repeat2

shareShare

Mikita Balesni 🇺🇦

@balesni

5 months ago

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

thumb_up_off_alt403

chat_bubble_outline26

repeat98

shareShare

Horizon Institute for Public Service

@horizonips

5 months ago

🚀 Applications are open for the 2026 Horizon Fellowship! Deadline: Aug 28 Join a community of 80+ alums and spend up to two years in DC working on emerging tech policy at agencies, congress, or think tanks. Learn more and apply here: horizonpublicservice.org/applications-o…

thumb_up_off_alt30

chat_bubble_outline0

repeat13

shareShare

Asterisk

@asteriskmgzn

5 months ago

Asterisk is launching an AI blogging fellowship! We're looking for people with unique perspectives on AI who want to take the first step to writing in public. We'll help you build a blog — and provide editorial feedback, mentorship from leading bloggers, a platform, & $1K

thumb_up_off_alt260

chat_bubble_outline9

repeat44

shareShare