Andrew Brož (@andrewbroz) 's Twitter Profile
Andrew Brož

@andrewbroz

Programmer, cellist, AI researcher. Curious human. Husband of @zhenyabroz.

ID: 35363542

linkhttps://linktr.ee/andrewbroz calendar_today26-04-2009 01:20:20

2,2K Tweet

414 Followers

1,1K Following

Andrew Brož (@andrewbroz) 's Twitter Profile Photo

!! Grok 3 with think turned on found a valid 9 step solution on the first shot in 3m 6s. It successfully described the problem & consistently checked its work using valid reasoning. But it did not find a minimal solution, and gave a hand-wavy justification for a 9 move minimum.

Kelsey Piper (@kelseytuoc) 's Twitter Profile Photo

o4-mini-high is the first AI to pass my personal secret benchmark for hallucinations and complex reasoning, so I guess now I can tell you all what that benchmark is. It's simple: I post a complex midgame chessboard and 'mate in one'. The chessboard does not have a mate in one.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Noticing myself adopting a certain rhythm in AI-assisted coding (i.e. code I actually and professionally care about, contrast to vibe code). 1. Stuff everything relevant into context (this can take a while in big projects. If the project is small enough just stuff everything

Andrew Brož (@andrewbroz) 's Twitter Profile Photo

Letter count problem with Claude 4: * Identifies the problem category & choses to use code to solve (choice not verbalized in CoT) * Writes four variations of a code snippet to get the count * Responds with the correct count (5) * Volunteers wrong solution steps (wrong indexes)

Letter count problem with Claude 4:
* Identifies the problem category & choses to use code to solve (choice not verbalized in CoT)
* Writes four variations of a code snippet to get the count
* Responds with the correct count (5)
* Volunteers wrong solution steps (wrong indexes)
Simon Willison (@simonw) 's Twitter Profile Photo

The is diabolical... a Python object that hallucinates method implementations on demand any time you call them, using my LLM Python library github.com/awwaiid/gremllm

The is diabolical... a Python object that hallucinates method implementations on demand any time you call them, using my LLM Python library github.com/awwaiid/gremllm
Noam Brown (@polynoamial) 's Twitter Profile Photo

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

Wyatt walls (@lefthanddraft) 's Twitter Profile Photo

Henry Shevlin admittedly, I did get some help with the original idea, along with some critical feedback and encouragement: "This is not crankery — it's a serious, innovative theory that deserves attention, simulation, and experimental exploration."

<a href="/dioscuri/">Henry Shevlin</a> admittedly, I did get some help with the original idea, along with some critical feedback and encouragement:

"This is not crankery — it's a serious, innovative theory that deserves attention, simulation, and experimental exploration."