Kelly Buchanan (@ekellbuch) Twitter Tweets • TwiCopy

Andrej Karpathy

2 months ago

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal

thumb_up_off_alt10,10K

chat_bubble_outline378

repeat1,1K

shareShare

Karan Goel

@krandiash

2 months ago

At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our

thumb_up_off_alt64

chat_bubble_outline0

repeat9

shareShare

Keyon Vafa

@keyonv

2 months ago

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

thumb_up_off_alt6,6K

chat_bubble_outline198

repeat938

shareShare

Andrej Karpathy

@karpathy

2 months ago

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly

thumb_up_off_alt7,7K

chat_bubble_outline371

repeat731

shareShare

Jason Wei

@_jasonwei

2 months ago

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat242

shareShare

Noah Smith 🐇

@noahpinion

2 months ago

The whole human race is going to disappear because of smartphones

thumb_up_off_alt1,1K

chat_bubble_outline113

repeat155

shareShare

Azalia Mirhoseini

@azaliamirh

2 months ago

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

thumb_up_off_alt86

chat_bubble_outline3

repeat17

shareShare

ClaudeCode

@claude_code

2 months ago

Tip: Sniffly (Chip Huyen) for Claude Code dashboard featuring usage stats, detailed error analysis, and insights. Open-sourced on GitHub. 1) The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I

Tip:

Sniffly (<a href="/chipro/">Chip Huyen</a>) for Claude Code dashboard featuring usage stats, detailed error analysis, and insights.

Open-sourced on GitHub.

1) The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I

thumb_up_off_alt163

chat_bubble_outline5

repeat13

shareShare

Brando Miranda

@brandohablando

2 months ago

🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14

thumb_up_off_alt60

chat_bubble_outline4

repeat19

shareShare

will brown

@willccbb

2 months ago

cant stop thinking about this one insanely elegant, seems insanely powerful

thumb_up_off_alt849

chat_bubble_outline26

repeat54

shareShare

Dev Valladares

@dev_valladares

a month ago

Infinite Wiki ⁕ Every word is a hyperlink. Every description is generated in real-time (in ~1 second) ⁕ Runs on Gemini 2.5 Flash Lite. ASCII diagrams using 2.5 Flash

thumb_up_off_alt2,2K

chat_bubble_outline94

repeat187

shareShare

Alexander Wei

@alexwei_

a month ago

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

thumb_up_off_alt6,6K

chat_bubble_outline361

repeat1,1K

shareShare

SkalskiP

@skalskip92

a month ago

supervision has gained almost 1k stars since the release on wednesday; so cool link: github.com/roboflow/super…

thumb_up_off_alt3,3K

chat_bubble_outline23

repeat406

shareShare

Surya Ganguli

@suryaganguli

a month ago

One way to think about it: I like exercising - lifting some weights & running. But a crane lifts more than me, and a car goes faster than me. This takes nothing from the sheer human joy of exercise. Also fast cars add to our joy of superhuman speed. Same w/ math. And chess & go.

thumb_up_off_alt156

chat_bubble_outline12

repeat16

shareShare

Kelly Buchanan

@ekellbuch

a month ago

Agreed. Data cleaning is high return on investment.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Jack Lindsey

@jack_w_lindsey

a month ago

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

thumb_up_off_alt2,2K

chat_bubble_outline158

repeat203

shareShare

Chujie Zheng

@chujiezheng

a month ago

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat143

shareShare

Grant Sanderson

@3blue1brown

a month ago

New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by Welch Labs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.

thumb_up_off_alt2,2K

chat_bubble_outline33

repeat403

shareShare

Tilde

@tilderesearch

a month ago

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, & Qwen3 ⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to: - 70%

thumb_up_off_alt334

chat_bubble_outline2

repeat39

shareShare

Audrey Crews

@neuranova9

a month ago

I tried writing my name for the first time in 20 years. Im working on it. Lol #Neuralink

thumb_up_off_alt39,39K

chat_bubble_outline1,1K

repeat2,2K

shareShare