Bayesian (@bayesian0_0) Twitter Tweets • TwiCopy

Bayesian

@bayesian0_0

2 months ago

I don’t think the open source oai model was delayed because of kimi

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Back for my monthly few minutes on Twitter and the main thought I want to share is that it seems like AI progress in H1 2025 was much slower than I feared after the crazy o3 benchmark numbers in December, so I now put less weight on very short (e.g. AI 2027-type) timelines.

thumb_up_off_alt130

chat_bubble_outline4

repeat4

shareShare

Sheryl Hsu

@sherylhsu02

2 months ago

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.

thumb_up_off_alt594

chat_bubble_outline13

repeat39

shareShare

Bayesian

@bayesian0_0

2 months ago

thought this too, and with nat lang IMO gold and upcoming GPT-5 I am looking forward to seeing a really important new data point wrt this

thumb_up_off_alt15

chat_bubble_outline2

repeat0

shareShare

Neel Nanda

@neelnanda5

2 months ago

Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat85

shareShare

Bayesian

@bayesian0_0

2 months ago

woah i'm doing well in the AI leaderboard on manifold!

thumb_up_off_alt72

chat_bubble_outline5

repeat0

shareShare

Bayesian

@bayesian0_0

2 months ago

I srsly thought you could get to agi just by scaling 2023 LLMs, and I still think so. I never thought that was likely to be the first arch that got us to AGI, because new algorithmic improvements are found constantly. a lot of miscommunication happens around likely vs possible

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Manifold

@manifoldmarkets

2 months ago

In case you're already nostalgic for GPT-5 hype, don't worry, there's already time to get started on Gemini 3 hype!

thumb_up_off_alt2

chat_bubble_outline1

repeat2

shareShare

Bayesian

@bayesian0_0

2 months ago

what are the odds that gpt-5's pretraining base is bigger than gpt-5 is, and gpt-5-main is the product of a distillation, vs the pretraining base being the same size as gpt-5? i'd make a market if this was publicly verifiable but since it isn't i ask y'all

thumb_up_off_alt3

chat_bubble_outline2

repeat0

shareShare

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

2 months ago

Kind feel like there were pretty similar steps in improvement for each of: GPT2 -> GPT3, GPT3 -> GPT4, and GPT4 -> GPT5. It’s just that most of the GPT4 -> GPT5 improvement was already realized by o3, and the step from there to GPT5 wasn’t that big

thumb_up_off_alt266

chat_bubble_outline10

repeat10

shareShare

Matthew Barnett

@matthewjbar

a month ago

Every consumer good has consumer surplus, so this explanation is too general to explain much about AI in particular. A better explanation for why AI isn't meaningfully showing up in GDP is that AI has simply had a relatively small impact on economic production so far.

thumb_up_off_alt152

chat_bubble_outline7

repeat6

shareShare

Epoch AI

@epochairesearch

a month ago

The higher the FrontierMath difficulty tier, the lower GPT-5 scored. This suggests a correlation between what mathematicians find difficult and what makes problems harder for AI systems to solve.

thumb_up_off_alt65

chat_bubble_outline1

repeat4

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

a month ago

I repeat that this is cope for boomers who didn't do the math the US has enough capacity to power their AGI race. Where this isn't true, hyperscalers will complete private power plants soon enough. China doesn't have nearly enough compute to make use of the power advantage.

thumb_up_off_alt265

chat_bubble_outline14

repeat10

shareShare

Jeffrey Ladish

@jeffladish

a month ago

Update your AI timelines based on how pretrain + RL scaling is going, not based on OpenAI's naming conventions. GPT-5 is just the model OpenAI decided to call 5. They could have called GPT-4.5 or o3 "GPT-5" if they wanted to

thumb_up_off_alt264

chat_bubble_outline19

repeat15

shareShare

Bayesian

Bayesian

Trevor Levin

Sheryl Hsu

Bayesian

Neel Nanda

Bayesian

Bayesian

Manifold

Bayesian

Daniel Eth (yes, Eth is my actual last name)

Matthew Barnett

Epoch AI

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Jeffrey Ladish