Evan Mays (@evanon0ping) 's Twitter Profile
Evan Mays

@evanon0ping

preparedness @openai

ID: 1668258296

calendar_today13-08-2013 16:41:22

733 Tweet

1,1K Followers

307 Following

Evan Mays (@evanon0ping) 's Twitter Profile Photo

People over fixate on SWE bench scores, every lab runs a different subset of the benchmark so no one really knows which model is really SOTA Try new models on your own real world tasks

Aidan McLaughlin (@aidan_mclau) 's Twitter Profile Photo

guys, ai progress just isn't slowing down gpt-5 completes tasks that take 52% longer trust the exponential (so i don't chartcrime, full plot linked below)

guys, ai progress just isn't slowing down

gpt-5 completes tasks that take 52% longer

trust the exponential

(so i don't chartcrime, full plot linked below)
Stephen McAleer (@mcaleerstephen) 's Twitter Profile Photo

We've entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving. It's a mistake the confuse the two.

Evan Mays (@evanon0ping) 's Twitter Profile Photo

PSA: check your surroundings before opening the Waymo door to get out Mine parked next to a pole today and the door door hit the pole

swyx (@swyx) 's Twitter Profile Photo

x.com/Smol_AI/status… paper doesn't mention "AGI" but if you consider that we used to define AGI as "outperform humans at most economically valuable work” then surely GDPVal is the most direct AGI benchmark we have ever had and we are between 77-95% of the way there and should

x.com/Smol_AI/status…
paper doesn't mention "AGI" but if you consider that we used to define AGI as "outperform humans at most economically valuable work” then surely GDPVal is the most direct AGI benchmark we have ever had and we are between 77-95% of the way there and should
John Kim (@john_sungjin) 's Twitter Profile Photo

Today, we're launching Village to let investors orchestrate teams of agents to scale their judgment. Three years ago, we bet that both humans and agents would need new tools to fulfill the promise of LLMs to transform research. Village is that tool, the first IDE for research

Olivia Li (@oliviali_) 's Twitter Profile Photo

We’ve been cooking up some pretty cool tech. If you’re an engineer/scientist who wants to work on quite literally world changing work, we’re hiring

Evan Mays (@evanon0ping) 's Twitter Profile Photo

Most open source releases are overhyped, we should doubt any lab’s staying power unless they consistently ship bleeding edge models

Rhythm Garg (@rhythmrg) 's Twitter Profile Photo

Excited to share what Yash Patil, Linden Li, and I have been up to since OpenAI: Applied Compute Companies like Cognition, DoorDash, and Mercor have already captured the initial gains from generalist models. They’re now pulling ahead with Specific Intelligence: custom