Andrew Brož (@andrewbroz) Twitter Tweets • TwiCopy

Andrew Brož

@andrewbroz

+ Follow

Programmer, cellist, AI researcher. Curious human. Husband of @zhenyabroz.

ID: 35363542

linkhttps://linktr.ee/andrewbroz calendar_today26-04-2009 01:20:20

2,2K Tweet

414 Followers

1,1K Following

Andrew Brož

@andrewbroz

6 months ago

!! Grok 3 with think turned on found a valid 9 step solution on the first shot in 3m 6s. It successfully described the problem & consistently checked its work using valid reasoning. But it did not find a minimal solution, and gave a hand-wavy justification for a 9 move minimum.

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Andrew Brož

@andrewbroz

6 months ago

Mushroom time

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Andrew Brož

@andrewbroz

6 months ago

Learning to draw, class № 6

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Kelsey Piper

@kelseytuoc

4 months ago

o4-mini-high is the first AI to pass my personal secret benchmark for hallucinations and complex reasoning, so I guess now I can tell you all what that benchmark is. It's simple: I post a complex midgame chessboard and 'mate in one'. The chessboard does not have a mate in one.

thumb_up_off_alt12,12K

chat_bubble_outline162

repeat589

shareShare

Andrej Karpathy

@karpathy

4 months ago

Noticing myself adopting a certain rhythm in AI-assisted coding (i.e. code I actually and professionally care about, contrast to vibe code). 1. Stuff everything relevant into context (this can take a while in big projects. If the project is small enough just stuff everything

thumb_up_off_alt12,12K

chat_bubble_outline464

repeat1,1K

shareShare

Akiyoshi Kitaoka

@akiyoshikitaoka

4 months ago

The inset appears to move.

thumb_up_off_alt205

chat_bubble_outline4

repeat40

shareShare

Andrew Brož

@andrewbroz

4 months ago

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Andrew Brož

@andrewbroz

3 months ago

Letter count problem with Claude 4: * Identifies the problem category & choses to use code to solve (choice not verbalized in CoT) * Writes four variations of a code snippet to get the count * Responds with the correct count (5) * Volunteers wrong solution steps (wrong indexes)

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

trash

@trashh_dev

3 months ago

claude: “you’re absolutely right!” me:

thumb_up_off_alt17,17K

chat_bubble_outline200

repeat1,1K

shareShare

Andrew Brož

@andrewbroz

3 months ago

Currently here.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

thebes

@voooooogel

2 months ago

thumb_up_off_alt2,2K

chat_bubble_outline46

repeat182

shareShare

Simon Willison

@simonw

2 months ago

The is diabolical... a Python object that hallucinates method implementations on demand any time you call them, using my LLM Python library github.com/awwaiid/gremllm

thumb_up_off_alt4,4K

chat_bubble_outline90

repeat287

shareShare

Andrew Brož

@andrewbroz

2 months ago

Happiness smells like

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Wyatt walls

@lefthanddraft

a month ago

ChatGPT psychosis takes more than one form

thumb_up_off_alt1,1K

chat_bubble_outline28

repeat67

shareShare

Noam Brown

@polynoamial

a month ago

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

thumb_up_off_alt709

chat_bubble_outline3

repeat39

shareShare

Wyatt walls

@lefthanddraft

a month ago

Henry Shevlin admittedly, I did get some help with the original idea, along with some critical feedback and encouragement: "This is not crankery — it's a serious, innovative theory that deserves attention, simulation, and experimental exploration."