Jide 🔍 (@jide_alaga) Twitter Tweets • TwiCopy

Alan Chan

@_achan96_

7 months ago

It's great that the new xAI risk management framework has a section on agent infrastructure!

It's great that the new <a href="/xai/">xAI</a> risk management framework has a section on agent infrastructure!

thumb_up_off_alt63

chat_bubble_outline4

repeat7

shareShare

Josh Smith

@smithtjosh

6 months ago

When is the last time you read something where the headline _understated_ the argument and evidence? 𝙲𝚑𝚊𝚛𝚕𝚎𝚜 𝙲. 𝙼𝚊𝚗𝚗's piece is that good

When is the last time you read something where the headline _understated_ the argument and evidence?

<a href="/CharlesCMann/">𝙲𝚑𝚊𝚛𝚕𝚎𝚜 𝙲. 𝙼𝚊𝚗𝚗</a>'s piece is that good

thumb_up_off_alt12,12K

chat_bubble_outline75

repeat862

shareShare

Ajeya Cotra

@ajeya_cotra

6 months ago

Impressions from talking to ML researchers and engineers about how they use AI, focusing on weaknesses and frictions (strengths are better covered by benchmarks) 🧵

thumb_up_off_alt130

chat_bubble_outline4

repeat18

shareShare

Jide 🔍

@jide_alaga

6 months ago

How is SSI already worth half of Anthropic??

thumb_up_off_alt9

chat_bubble_outline2

repeat0

shareShare

3) If you get greedy and decide to directly train the CoT not to think about reward hacking, it seems work for a bit, but then models eventually still learn to reward-hack… except they hide misaligned reasoning so it doesn’t show up in their CoT!

thumb_up_off_alt148

chat_bubble_outline6

repeat9

shareShare

Marius Hobbhahn

@mariushobbhahn

6 months ago

I think this paper is really important! 1. It shows that current models already have the capabilities and propensities to do surprisingly clever reward hacks. 2. It shows the utility of CoT monitoring in the regime where the CoT is legible and faithful. 3. IMO, the most

thumb_up_off_alt182

chat_bubble_outline5

repeat14

shareShare

Jide 🔍

@jide_alaga

5 months ago

I've been informally referring to this as "Barnes' Law" because nothing else hits as hard 🔥

thumb_up_off_alt80

chat_bubble_outline4

repeat5

shareShare

Jide 🔍

@jide_alaga

5 months ago

At this point there's no reason for it to take 2-8 years to animate a Studio Ghibli movie. I want one every 6 months..

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jide 🔍

@jide_alaga

5 months ago

One step closer to Coordinated Pausing: governance.ai/research-paper…

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Jide 🔍

@jide_alaga

5 months ago

I genuinely believe I have a happier life than a lot of famous people and probably most A-listers. I don't understand why people want to be famous so badly..

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Jonas Schuett

@jonasschuett

5 months ago

I'm Dr. Jonas Schuett 🥳 Thanks to my supervisor Tobias Tröger from Goethe-Universität!

I'm Dr. Jonas Schuett 🥳

Thanks to my supervisor <a href="/TobiasTroeger/">Tobias Tröger</a> from <a href="/goetheuni/">Goethe-Universität</a>!

thumb_up_off_alt45

chat_bubble_outline2

repeat2

shareShare

Jide 🔍

@jide_alaga

5 months ago

Shower thought: I would love to see something like a memetic dashboard, showing the most powerful memes in the world, where they are growing/declining, and describing how (and how strongly) they tend to motivate behaviour.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Chris Painter

@chrispainteryup

4 months ago

When should AI companies publish system cards? I want to make the case that the ideal system would involve something closer to quarterly reporting, rather than focusing so much on deployment. Sharing here to get pushback and debate🧵

thumb_up_off_alt65

chat_bubble_outline3

repeat9

shareShare

Chris Painter

@chrispainteryup

4 months ago

Cool and kind of wild to see METR's work on agent task length doubling-times mentioned in the opening moments of Joe Rogan's 3-hour episode on AI with Jeremie Harris and Edouard Harris

thumb_up_off_alt85

chat_bubble_outline5

repeat4

shareShare

Jide 🔍

@jide_alaga

4 months ago

I think these kinds of company-evaluator collaborations provide much better public assurances for safety than the current status quo, and I think it's incredibly exciting! We need more of this, kudos to Amazon!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

OpenAI

@openai

3 months ago

Introducing the Safety Evaluations Hub—a resource to explore safety results for our models. While system cards share safety metrics at launch, the Hub will be updated periodically as part of our efforts to communicate proactively about safety. openai.com/safety/evaluat…

thumb_up_off_alt1,1K

chat_bubble_outline111

repeat152

shareShare

Joel Becker

@joel_bkr

3 months ago

wicked preliminary result from Thomas Akira Kwa. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

wicked preliminary result from <a href="/Kwathomas0/">Thomas Akira Kwa</a>. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

thumb_up_off_alt85

chat_bubble_outline2

repeat13

shareShare

Rob Wiblin

@robertwiblin

3 months ago

AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,

thumb_up_off_alt211

chat_bubble_outline11

repeat26

shareShare

Jide 🔍

Alan Chan

Josh Smith

Ajeya Cotra

Jide 🔍

Yo Shavit

Marius Hobbhahn

Jide 🔍

Jide 🔍

Jide 🔍

Jide 🔍

Jonas Schuett

Jide 🔍

Chris Painter

Chris Painter

Jide 🔍

OpenAI

Joel Becker

Rob Wiblin