Charles Foster (@cfgeek) 's Twitter Profile
Charles Foster

@cfgeek

“It’s OK, he’s friendly!” 🪄 Tensor-enjoyer 🧪 @METR_Evals. Occasionally writing at “Context Windows” on Substack. 🦋: cfoster.bsky.social

ID: 1275310261696557057

linkhttps://contextwindows.substack.com calendar_today23-06-2020 06:10:56

4,4K Tweet

2,2K Takipçi

431 Takip Edilen

Charles Foster (@cfgeek) 's Twitter Profile Photo

At least three different companies now (OpenAI, Google, and Harmonic) have announced that their AI systems managed to solve the same 5 out of 6 problems from this year’s IMO.

Charles Foster (@cfgeek) 's Twitter Profile Photo

Why would anyone try to “build AGI in secret” when they could just pursue it openly, proclaiming loudly that it’s their top priority, monetizing whatever valuable stuff they develop along the way, with their employees + investors + customers + fans all cheering them on?

Charles Foster (@cfgeek) 's Twitter Profile Photo

At last, we’ve built the automatic measure-targeter from the famed adage “When a measure becomes a target, it ceases to be a good measure”!

Stefan Schubert (@stefanfschubert) 's Twitter Profile Photo

In 2025, "AI capex" - information processing equipment plus software - has added more to US growth than consumer spending. This in spite of the fact that the former is 6% of the economy, and the latter 70%.

In 2025, "AI capex" - information processing equipment plus software - has added more to US growth than consumer spending.

This in spite of the fact that the former is 6% of the economy, and the latter 70%.
Charles Foster (@cfgeek) 's Twitter Profile Photo

“It’s OK to release because it doesn’t increase risk by much.” “Relative to Internet access, right?” “…” “Relative to Internet access, right?!”

Charles Foster (@cfgeek) 's Twitter Profile Photo

“Pre-training is still scaling! Pre-training is still scaling!” I continue to insist as I slowly reallocate my entire training cluster to RL

METR (@metr_evals) 's Twitter Profile Photo

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. 

We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.
Sydney (@sydneyvonarx) 's Twitter Profile Photo

The terms “CoT” and reasoning trace make it sound like the CoT is a summary of an LLM’s reasoning. But IMO it’s more accurate to view CoT as a tool models use to think better. CoT monitoring is about tracking how models use this tool so we can glean insight into their