Ted Sanders (@sandersted) Twitter Tweets • TwiCopy

Ted Sanders

a year ago

AGI is hard to define. my preferred definition of AGI is a computer system that can can accomplish a task impossible for 100 human geniuses working together, such as publishing a blog post with a single canonical spelling of GPT-4o / gpt-4o / gpt4o

thumb_up_off_alt275

chat_bubble_outline26

repeat10

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

4 months ago

GPT-5 is here - and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to OpenAI on this

thumb_up_off_alt2,2K

chat_bubble_outline115

repeat401

shareShare

Ted Sanders

@sandersted

4 months ago

GPT-5 is out! it's by no means perfect, but it's better than what's come before. if you have complaints about its coding, hit me up and I'll see if we can make future models even better for you

thumb_up_off_alt91

chat_bubble_outline7

repeat6

shareShare

Ted Sanders

@sandersted

4 months ago

GPT-5 is here! it's way better at coding - not just in pointless evals, but real usage. using it in Cursor, I think my favorite leap is code Q&A - it's been truly useful at helping me figure out our complex RL codebase. very tenacious digger! openai.com/index/introduc…

thumb_up_off_alt9

chat_bubble_outline5

repeat1

shareShare

eric zakariasson

@ericzakariasson

4 months ago

gpt-5 is now free in Cursor, go try it out! (reload cursor if you don't see it yet) we've worked closely with OpenAI team to make this happen, and together we also put out a prompting guide for gpt-5. here are some examples prompts we've seen working well

thumb_up_off_alt692

chat_bubble_outline37

repeat38

shareShare

Cognition

@cognition_labs

4 months ago

GPT-5 represents a huge step up over previous OpenAI models, such as GPT-4.1. We believe GPT-5 is at the frontier of agentic ability and shines on tasks that require complex code understanding. On our junior SWE evals, GPT-5 is particularly strong at code exploration and

thumb_up_off_alt63

chat_bubble_outline2

repeat9

shareShare

Lech Mazur

@lechmazur

4 months ago

GPT-5 (medium reasoning) sets a new record on the Confabulations/Hallucinations on Provided Texts benchmark!

thumb_up_off_alt134

chat_bubble_outline18

repeat16

shareShare

Lech Mazur

@lechmazur

4 months ago

GPT-5 (medium reasoning) is the new leader on the Short Story Creative Writing benchmark! GPT-5 mini (medium reasoning) is much better than o4-mini (medium reasoning). Claude Opus 4.1 shows gains over Opus 4.

thumb_up_off_alt220

chat_bubble_outline38

repeat43

shareShare

Daniel J

@djarosai

4 months ago

Full results: GPT-5 maintains strong performance. GPT-5-mini notably competitive with o3 and gemini-2.5-pro. Absolute accuracy numbers depend on instruction and task complexity and will vary across settings—key takeaway is relative model rankings and degradation patterns

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare

Lisan al Gaib

@scaling01

4 months ago

As per tradition: A thread with all positive results I created or shared about GPT-5 It's not my fault negative results take-off more than positive ones I can't make a thread with more than 25 posts, but here you go:

thumb_up_off_alt161

chat_bubble_outline8

repeat9

shareShare

Sam Altman

@sama

4 months ago

today we are significantly increasing rate limits for reasoning for chatgpt plus users, and all model-class limits will shortly be higher than they were before gpt-5. we will also shortly make a UI change to indicate which model is working.

thumb_up_off_alt11,11K

chat_bubble_outline1,1K

repeat648

shareShare

Mostafa Rohaninejad

@mostafarohani

3 months ago

1/n I’m really excited to share that our OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have

1/n
I’m really excited to share that our <a href="/OpenAI/">OpenAI</a> reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have

thumb_up_off_alt2,2K

chat_bubble_outline134

repeat434

shareShare

Ted Sanders

@sandersted

3 months ago

LLMs are still far from being able to do most human work, but the pace of progress impresses me: in ~64 weeks, they've gone from 12% to 48% on GDPval. (also impressed that OpenAI keeps publishing useful papers & data, against its profit incentives) openai.com/index/gdpval/

thumb_up_off_alt33

chat_bubble_outline1

repeat14

shareShare