Paul Calcraft (@paul_cal) Twitter Tweets • TwiCopy

Paul Calcraft

@paul_cal

+ Follow

AI is good & bad, actually.

Tweeting about AI/ML methods, software dev, research, tech and society, social impact.

20yrs in tech, 10 in ML/AI, PhD in comp sci

ID: 1681581835

linkhttps://linkedin.com/in/paulcalcraft calendar_today18-08-2013 20:21:45

5,5K Tweet

5,5K Followers

4,4K Following

web weaver

@deepfates

7 months ago

"Interdimensional Cable", shorts made with Veo 3 ai. By CodeSamurai on Reddit

thumb_up_off_alt649

chat_bubble_outline29

repeat59

shareShare

Hashem Al-Ghaili

@hashemghaili

7 months ago

Prompt Theory (Made with Veo 3) What if AI-generated characters refused to believe they were AI-generated?

thumb_up_off_alt22,22K

chat_bubble_outline948

repeat3,3K

shareShare

Paul Calcraft

@paul_cal

7 months ago

Ok that's kinda nice. First time I've seen summoning a coding agent on github via just an @ mention

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

dwarkesh's most recent episode with sholto + trenton is one of the best resources for vibe-checking your takes on LLMs/AI and thinking on the near-future, since broadly everything they say here is both correct and good: youtube.com/watch?v=64lXQP…

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat64

shareShare

Paul Calcraft

@paul_cal

7 months ago

pass@1 vs. cons@k vs. parallel compute vs. pass@k

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Paul Calcraft

@paul_cal

7 months ago

AHHHHH. Average over N gives no advantage. It is definitely not fancy best of N 10 flips of a coin tells you more about the coin's weighting than one flip. AHHHHH

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Paul Calcraft

@paul_cal

7 months ago

Yep LLMs still suck at video games. 0.5-1.6% completion rate at best Again, not surprising given that even visual tic tac toe or connect 4 is not yet a done deal

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Paul Calcraft

@paul_cal

7 months ago

I hear he passed the Claudes interview round without using jailbreaks

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Paul Calcraft

@paul_cal

7 months ago

Gemini solves an easy Portal test chamber in ~15 minutes. I try to control my frustration 01:01 looking at a closed door, hallucinates next puzzle thru the door 01:50 first dumb idea 03:12 realises key objective 03:41 thinks it's solved it; has not (relatable) 04:01 repeats same

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Ethan Mollick

@emollick

7 months ago

Nice example of both the power and limits of general purpose AI agents. Operator: "go find a multiplayer game you can play live online right now and win it against a human" With just that prompt it found a multiplayer tic-tac-toe game, joined it... & lost operator.chatgpt.com/v/683aa2835ea0…

thumb_up_off_alt310

chat_bubble_outline17

repeat16

shareShare

Paul Calcraft

@paul_cal

6 months ago

"Grandad, were you really there when computers first started to talk?" "That's right kiddo. I was right there in the first wave. Using them to streamline B2B SaaS"

thumb_up_off_alt7

chat_bubble_outline2

repeat0

shareShare

Paul Calcraft

@paul_cal

6 months ago

People are posting LLM slop on specific programming language subreddits now And the best part? They can prompt them into bro/cuz lower case slop. But they can't escape the writing style — it's completely unavoidable.

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Paul Calcraft

@paul_cal

6 months ago

o4-mini-high w search used a single reddit thread w 2 upvotes as the evidence for 4 different claims in a niche programming query I asked. Turns out thread is right. Is o4-mini gullible or wise? In information vacuum, I assume easily led astray by plausible but incorrect info?

thumb_up_off_alt10

chat_bubble_outline2

repeat0

shareShare

Rob Wiblin

@robertwiblin

6 months ago

What in the actual fuck:

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat165

shareShare

Paul Calcraft

@paul_cal

6 months ago

If you're a decent writer writing for reasonably discerning/invested/smart readers, the em dash thing just isn't a problem. You can still use em dashes because your readers' subconscious AI detectors will be much more sophisticated than "omg em dash = AI!"

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare