Paul Calcraft (@paul_cal) 's Twitter Profile
Paul Calcraft

@paul_cal

AI is good & bad, actually.

Tweeting about AI/ML methods, software dev, research, tech and society, social impact.

20yrs in tech, 10 in ML/AI, PhD in comp sci

ID: 1681581835

linkhttps://linkedin.com/in/paulcalcraft calendar_today18-08-2013 20:21:45

5,5K Tweet

5,5K Followers

4,4K Following

near (@nearcyan) 's Twitter Profile Photo

dwarkesh's most recent episode with sholto + trenton is one of the best resources for vibe-checking your takes on LLMs/AI and thinking on the near-future, since broadly everything they say here is both correct and good: youtube.com/watch?v=64lXQP…

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

AHHHHH. Average over N gives no advantage. It is definitely not fancy best of N 10 flips of a coin tells you more about the coin's weighting than one flip. AHHHHH

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

Yep LLMs still suck at video games. 0.5-1.6% completion rate at best Again, not surprising given that even visual tic tac toe or connect 4 is not yet a done deal

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

Gemini solves an easy Portal test chamber in ~15 minutes. I try to control my frustration 01:01 looking at a closed door, hallucinates next puzzle thru the door 01:50 first dumb idea 03:12 realises key objective 03:41 thinks it's solved it; has not (relatable) 04:01 repeats same

Ethan Mollick (@emollick) 's Twitter Profile Photo

Nice example of both the power and limits of general purpose AI agents. Operator: "go find a multiplayer game you can play live online right now and win it against a human" With just that prompt it found a multiplayer tic-tac-toe game, joined it... & lost operator.chatgpt.com/v/683aa2835ea0…

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

"Grandad, were you really there when computers first started to talk?" "That's right kiddo. I was right there in the first wave. Using them to streamline B2B SaaS"

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

People are posting LLM slop on specific programming language subreddits now And the best part? They can prompt them into bro/cuz lower case slop. But they can't escape the writing style — it's completely unavoidable.

People are posting LLM slop on specific programming language subreddits now

And the best part? They can prompt them into bro/cuz lower case slop. But they can't escape the writing style —  it's completely unavoidable.
Paul Calcraft (@paul_cal) 's Twitter Profile Photo

o4-mini-high w search used a single reddit thread w 2 upvotes as the evidence for 4 different claims in a niche programming query I asked. Turns out thread is right. Is o4-mini gullible or wise? In information vacuum, I assume easily led astray by plausible but incorrect info?

Paul Calcraft (@paul_cal) 's Twitter Profile Photo

If you're a decent writer writing for reasonably discerning/invested/smart readers, the em dash thing just isn't a problem. You can still use em dashes because your readers' subconscious AI detectors will be much more sophisticated than "omg em dash = AI!"