Charles Foster (@cfgeek) Twitter Tweets • TwiCopy

Charles Foster

@cfgeek

+ Follow

“It’s OK, he’s friendly!” 🪄 Tensor-enjoyer 🧪 @METR_Evals. Occasionally writing at “Context Windows” on Substack. 🦋: cfoster.bsky.social

ID: 1275310261696557057

linkhttps://contextwindows.substack.com calendar_today23-06-2020 06:10:56

4,4K Tweet

2,2K Takipçi

431 Takip Edilen

Charles Foster

@cfgeek

4 months ago

At least three different companies now (OpenAI, Google, and Harmonic) have announced that their AI systems managed to solve the same 5 out of 6 problems from this year’s IMO.

thumb_up_off_alt11

chat_bubble_outline2

repeat0

shareShare

Why would anyone try to “build AGI in secret” when they could just pursue it openly, proclaiming loudly that it’s their top priority, monetizing whatever valuable stuff they develop along the way, with their employees + investors + customers + fans all cheering them on?

thumb_up_off_alt38

chat_bubble_outline5

repeat0

shareShare

Charles Foster

@cfgeek

4 months ago

At last, we’ve built the automatic measure-targeter from the famed adage “When a measure becomes a target, it ceases to be a good measure”!

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Stefan Schubert

@stefanfschubert

4 months ago

In 2025, "AI capex" - information processing equipment plus software - has added more to US growth than consumer spending. This in spite of the fact that the former is 6% of the economy, and the latter 70%.

thumb_up_off_alt92

chat_bubble_outline6

repeat17

shareShare

Charles Foster

@cfgeek

4 months ago

This smells funny. Suspect there’s something else going on here

thumb_up_off_alt15

chat_bubble_outline4

repeat0

shareShare

Charles Foster

@cfgeek

4 months ago

“It’s OK to release because it doesn’t increase risk by much.” “Relative to Internet access, right?” “…” “Relative to Internet access, right?!”

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

Charles Foster

@cfgeek

4 months ago

By default, we’ll see open-weight models catch up to this capability level within the next ~12 months. And then what?

thumb_up_off_alt143

chat_bubble_outline12

repeat8

shareShare

Charles Foster

@cfgeek

4 months ago

“Pre-training is still scaling! Pre-training is still scaling!” I continue to insist as I slowly reallocate my entire training cluster to RL

thumb_up_off_alt147

chat_bubble_outline2

repeat6

shareShare

Charles Foster

@cfgeek

4 months ago

Really like that this shows how the cost/accuracy possibility frontier is expanding over time

thumb_up_off_alt39

chat_bubble_outline1

repeat1

shareShare

METR

@metr_evals

4 months ago

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

thumb_up_off_alt292

chat_bubble_outline7

repeat42

shareShare

Sydney

@sydneyvonarx

4 months ago

The terms “CoT” and reasoning trace make it sound like the CoT is a summary of an LLM’s reasoning. But IMO it’s more accurate to view CoT as a tool models use to think better. CoT monitoring is about tracking how models use this tool so we can glean insight into their

thumb_up_off_alt115

chat_bubble_outline5

repeat12

shareShare

Charles Foster

@cfgeek

4 months ago

AI company that waits for a backwards-incompatible API change to change its version number

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare