Ted Sanders (@sandersted) 's Twitter Profile
Ted Sanders

@sandersted

Research manager at OpenAI. Be kind to others, and yourself.

ID: 78419620

linkhttp://www.tedsanders.com calendar_today29-09-2009 21:20:46

623 Tweet

8,8K Takipçi

979 Takip Edilen

Ted Sanders (@sandersted) 's Twitter Profile Photo

AGI is hard to define. my preferred definition of AGI is a computer system that can can accomplish a task impossible for 100 human geniuses working together, such as publishing a blog post with a single canonical spelling of GPT-4o / gpt-4o / gpt4o

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

GPT-5 is here - and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to OpenAI on this

GPT-5 is here - and it’s #1 across the board.

🥇#1 in Text, WebDev, and Vision Arena
🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more

Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date.

Huge congrats to <a href="/OpenAI/">OpenAI</a> on this
Ted Sanders (@sandersted) 's Twitter Profile Photo

GPT-5 is out! it's by no means perfect, but it's better than what's come before. if you have complaints about its coding, hit me up and I'll see if we can make future models even better for you

Ted Sanders (@sandersted) 's Twitter Profile Photo

GPT-5 is here! it's way better at coding - not just in pointless evals, but real usage. using it in Cursor, I think my favorite leap is code Q&A - it's been truly useful at helping me figure out our complex RL codebase. very tenacious digger! openai.com/index/introduc…

eric zakariasson (@ericzakariasson) 's Twitter Profile Photo

gpt-5 is now free in Cursor, go try it out! (reload cursor if you don't see it yet) we've worked closely with OpenAI team to make this happen, and together we also put out a prompting guide for gpt-5. here are some examples prompts we've seen working well

Cognition (@cognition_labs) 's Twitter Profile Photo

GPT-5 represents a huge step up over previous OpenAI models, such as GPT-4.1. We believe GPT-5 is at the frontier of agentic ability and shines on tasks that require complex code understanding. On our junior SWE evals, GPT-5 is particularly strong at code exploration and

GPT-5 represents a huge step up over previous OpenAI models, such as GPT-4.1. We believe GPT-5 is at the frontier of agentic ability and shines on tasks that require complex code understanding. On our junior SWE evals, GPT-5 is particularly strong at code exploration and
Lech Mazur (@lechmazur) 's Twitter Profile Photo

GPT-5 (medium reasoning) is the new leader on the Short Story Creative Writing benchmark! GPT-5 mini (medium reasoning) is much better than o4-mini (medium reasoning). Claude Opus 4.1 shows gains over Opus 4.

GPT-5 (medium reasoning) is the new leader on the Short Story Creative Writing benchmark!

GPT-5 mini (medium reasoning) is much better than o4-mini (medium reasoning).

Claude Opus 4.1 shows gains over Opus 4.
Daniel J (@djarosai) 's Twitter Profile Photo

Full results: GPT-5 maintains strong performance. GPT-5-mini notably competitive with o3 and gemini-2.5-pro. Absolute accuracy numbers depend on instruction and task complexity and will vary across settings—key takeaway is relative model rankings and degradation patterns

Full results: GPT-5 maintains strong performance. GPT-5-mini notably competitive with o3 and gemini-2.5-pro.
Absolute accuracy numbers depend on instruction and task complexity and will vary across settings—key takeaway is relative model rankings and degradation patterns
Lisan al Gaib (@scaling01) 's Twitter Profile Photo

As per tradition: A thread with all positive results I created or shared about GPT-5 It's not my fault negative results take-off more than positive ones I can't make a thread with more than 25 posts, but here you go:

Sam Altman (@sama) 's Twitter Profile Photo

today we are significantly increasing rate limits for reasoning for chatgpt plus users, and all model-class limits will shortly be higher than they were before gpt-5. we will also shortly make a UI change to indicate which model is working.

Mostafa Rohaninejad (@mostafarohani) 's Twitter Profile Photo

1/n I’m really excited to share that our OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have

1/n
I’m really excited to share that our <a href="/OpenAI/">OpenAI</a> reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have
Ted Sanders (@sandersted) 's Twitter Profile Photo

LLMs are still far from being able to do most human work, but the pace of progress impresses me: in ~64 weeks, they've gone from 12% to 48% on GDPval. (also impressed that OpenAI keeps publishing useful papers & data, against its profit incentives) openai.com/index/gdpval/