Mark Chen (@markchen90) Twitter Tweets • TwiCopy

Mark Chen

6 months ago

Reasoning models like o3 are starting to aid in deep technical work and meaningful scientific discovery. Results like this will become increasingly common over the next year:

thumb_up_off_alt552

chat_bubble_outline20

repeat48

shareShare

A lot of my ex-colleagues now work at OpenAI and other top-tier AI companies. When I ask them how things are going, without fail, they ALL mention how damn intense everything is. It's full-max pushing all the time. This is also borne out in my general conversations with

thumb_up_off_alt950

chat_bubble_outline27

repeat57

shareShare

Mark Chen

@markchen90

6 months ago

Best reason to switch from using 4o to o3 is that o3 agrees the best running back in the NFL is Saquon Barkley!

thumb_up_off_alt197

chat_bubble_outline9

repeat2

shareShare

OpenAI

@openai

6 months ago

OpenAI o3-pro today.

thumb_up_off_alt15,15K

chat_bubble_outline882

repeat1,1K

shareShare

Noam Brown

@polynoamial

6 months ago

I'm fortunate to be able to devote my career to researching AI and building reasoning models like o3 for the world to use. If you want to join us in pushing forward the intelligence frontier, we're hiring at OpenAI.

thumb_up_off_alt1,1K

chat_bubble_outline46

repeat48

shareShare

OpenAI

@openai

5 months ago

OpenAI Podcast Episode 2 is now live! Mark Chen and Nick Turley join @andrewmayne to pull back the curtain on the making of ChatGPT. They also get into how products are developed and what’s next for agentic coding and multimodal assistants.

thumb_up_off_alt1,1K

chat_bubble_outline162

repeat218

shareShare

Jakub Pachocki

@merettm

5 months ago

I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview. As AI systems spend more compute working e.g. on long term research problems, it is

thumb_up_off_alt404

chat_bubble_outline23

repeat66

shareShare

Psyho

@fakepsyho

5 months ago

Mark Chen AtCoder Amazing results btw. This is definitely much better than what the competitive programming community was expecting. I hope that you'll publish something about it. I'm curious about the testing budget / unique solutions created, which hopefully is something that can be disclosed.

thumb_up_off_alt320

chat_bubble_outline6

repeat6

shareShare

Michelle Pokrass

@michpokrass

4 months ago

turns out you can always just work harder

thumb_up_off_alt338

chat_bubble_outline16

repeat17

shareShare

Mark Chen

@markchen90

4 months ago

Impressive work! Makes you think about whether we’re in a simulation.

thumb_up_off_alt1,1K

chat_bubble_outline61

repeat36

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

4 months ago

GPT-5 is here - and it’s #1 across the board. 🥇#1 in Text, WebDev, and Vision Arena 🥇#1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to OpenAI on this

thumb_up_off_alt2,2K

chat_bubble_outline115

repeat401

shareShare

Mark Chen

@markchen90

4 months ago

Try GPT-5 today! One of our top level goals was to marry the pretraining and reasoning paradigms - this required lots of hard work from all of our research teams (special shoutout to Max Schwarzer) to make the models smarter, faster, more robust, and more reliable!

thumb_up_off_alt287

chat_bubble_outline21

repeat13

shareShare

Tejal Patwardhan

@tejalpatwardhan

4 months ago

many plots today made me sad. this plot is important though: the models are truly getting more and more practical in the real-world

thumb_up_off_alt153

chat_bubble_outline5

repeat10

shareShare

TBPN

@tbpn

4 months ago

From the archive: We asked @MarkChen (Chief Research Officer at OpenAI) how chasing user metrics could harm AI development. He highlighted how chasing simple metrics like daily active users (DAU) can distort a model’s behavior over time: “If you just boost responses where

thumb_up_off_alt189

chat_bubble_outline18

repeat9

shareShare

Yam Peleg

@yampeleg

4 months ago

GPT-5-pro is great imo, noticeably better than o3-pro

thumb_up_off_alt324

chat_bubble_outline25

repeat9

shareShare

Noam Brown

@polynoamial

4 months ago

In my opinion, the most important takeaway from this result is that our OpenAI International Math Olympiad (IMO) gold model is also our best competitive coding model. 🧵

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat90

shareShare

Jerry Tworek

@millionint

4 months ago

In short 49th to 98th percentile of performance in IOI in one year without training any specialised models. Same RL as for everything else we do

thumb_up_off_alt820

chat_bubble_outline65

repeat58

shareShare

Mark Chen

Mark Chen

Aditya Agarwal

Mark Chen

OpenAI

Noam Brown

OpenAI

Jakub Pachocki

Psyho

Michelle Pokrass

Mark Chen

lmarena.ai (formerly lmsys.org)

Mark Chen

Tejal Patwardhan

TBPN

Yam Peleg

Noam Brown

Jerry Tworek