Sagar Joglekar (@sagarjoglekar) 's Twitter Profile
Sagar Joglekar

@sagarjoglekar

ML scientist, curious guy... Trying to figure this thing out on the fly.

ID: 114636884

linkhttps://sagarjoglekar.com calendar_today16-02-2010 03:35:28

4,4K Tweet

363 Followers

365 Following

METR (@metr_evals) 's Twitter Profile Photo

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.

In a new report, we evaluate whether GPT-5 poses significant catastrophic risks via AI R&D acceleration, rogue replication, or sabotage of AI labs. 

We conclude that this seems unlikely. However, capability trends continue rapidly, and models display increasing eval awareness.
Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

It’s like the whole world is watching two exhausted gladiators in the arena, one “the economy,” the other “the human condition.” In the end, neither survives, cut down either by the blade of overhyped, half-baked technology or by the plutocratic emperor’s cold thumbs-down.

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

Margins have started to matter now: The unified UI, even for paying users is a giant canary in the coal mine of OpenAI’s balance sheet. They are for sure loosing money on most models, and dynamic routing to weaker models is the only way to dig themselves out.

Shai Shalev-Shwartz (@shai_s_shwartz) 's Twitter Profile Photo

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call

Are frontier AI models really capable of “PhD-level” reasoning? To answer this question, we introduce FormulaOne, a new reasoning benchmark of expert-level Dynamic Programming problems. We have curated a benchmark consisting of three tiers, in increasing complexity, which we call
elvis (@omarsar0) 's Twitter Profile Photo

AI Agents are terrible at long-horizon tasks. Even the new GPT-5 model struggles with long-horizon tasks. This is one of the most pressing challenges when building AI agents. Pay attention, AI devs! This is a neat paper that went largely unnoticed. Here are my notes:

AI Agents are terrible at long-horizon tasks.

Even the new GPT-5 model struggles with long-horizon tasks.

This is one of the most pressing challenges when building AI agents.

Pay attention, AI devs!

This is a neat paper that went largely unnoticed.

Here are my notes:
Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

Thinking about model generalization is quite painful. We observe empirically that models trained with SGD on cross-entropy generalize, instead of just memorize the train data. Even when they have sufficient capacity to memorize. We do not---i repeat--- we. do. not. have a

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

Applied researchers should spend some time every year doing pure research.. I stumbled upon the trick in this paper, while playing with GRPO for a purely applied task arxiv.org/abs/2508.09726 … They call it group filtering, I called it adding confirmation bias to the policy 😁

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

The emerald green coat that this beautiful city wears in monsoon makes your heart skip a beat, and a strange sense of longing for a life not lived sets in… But then you plan to go somewhere to enjoy it, and that sense vanishes as quickly as it sets.

François Chollet (@fchollet) 's Twitter Profile Photo

LLM adoption among US workers is closing in on 50%. Meanwhile labor productivity growth is lower than in 2020. Many counter-arguments can be made here, e.g. "they don't know yet how to be productive with it, they've only been using for 1-2 years", "50% is still too low to see

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

Extrinsic value NEVER aligns with intrinsic worth. Peace lies in accepting this, grounding the latter, and acting only on what you can control.

John Burn-Murdoch (@jburnmurdoch) 's Twitter Profile Photo

Very important paper, for two reasons: 1) Key finding: employment *is* falling in early-career roles exposed to LLM automation 2) Shows that administrative data (millions of payroll records) is much better than survey data for questions requiring precision (occupation x age)

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

This is truly tragic.. a beautiful, thought-provoking piece of art and social commentary lost. When the jester is banished from the court, it’s the subjects who should be concerned.

Sagar Joglekar (@sagarjoglekar) 's Twitter Profile Photo

The UK should expand its global talent visa scheme. Everyone affected by Trump’s EO should automatically get it .. This would catalyse the startup sector here and save millions in vetting for the UK home office , while bring in legit tax paying labour.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

"AI isn't replacing radiologists" good article Expectation: rapid progress in image recognition AI will delete radiology jobs (e.g. as famously predicted by Geoff Hinton now almost a decade ago). Reality: radiology is doing great and is growing. There are a lot of imo naive