Kelly Buchanan (@ekellbuch) 's Twitter Profile
Kelly Buchanan

@ekellbuch

Postdoctoral Fellow @Stanford Reliable AI for science. PhD from @cu_neurotheory @ZuckermanBrain. Industry: Research @GoogleAI

ID: 334922130

linkhttp://ekbuchanan.com calendar_today13-07-2011 21:42:33

599 Tweet

993 Takipçi

1,1K Takip Edilen

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal

Karan Goel (@krandiash) 's Twitter Profile Photo

At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our

Keyon Vafa (@keyonv) 's Twitter Profile Photo

Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws 🧵

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly

Jason Wei (@_jasonwei) 's Twitter Profile Photo

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry…

Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally.

Great examples of
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

Looking forward to attending ICML!

Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
ClaudeCode (@claude_code) 's Twitter Profile Photo

Tip: Sniffly (Chip Huyen) for Claude Code dashboard featuring usage stats, detailed error analysis, and insights. Open-sourced on GitHub. 1) The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I

Tip:

Sniffly (<a href="/chipro/">Chip Huyen</a>) for Claude Code dashboard featuring usage stats, detailed error analysis, and insights.

Open-sourced on GitHub.

1) The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist.  So I
Brando Miranda (@brandohablando) 's Twitter Profile Photo

🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14

🚨 Can your LLM really do math—or is it cramming the test set?
 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 

1. openreview.net/forum?id=kqj2C…
2. icml.cc/virtual/2025/p…

#ICML2025 East Exhibition Hall A-B, #E-2502

🧵1/14
Dev Valladares (@dev_valladares) 's Twitter Profile Photo

Infinite Wiki ⁕ Every word is a hyperlink. Every description is generated in real-time (in ~1 second) ⁕ Runs on Gemini 2.5 Flash Lite. ASCII diagrams using 2.5 Flash

Alexander Wei (@alexwei_) 's Twitter Profile Photo

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Surya Ganguli (@suryaganguli) 's Twitter Profile Photo

One way to think about it: I like exercising - lifting some weights & running. But a crane lifts more than me, and a car goes faster than me. This takes nothing from the sheer human joy of exercise. Also fast cars add to our joy of superhuman speed. Same w/ math. And chess & go.

Jack Lindsey (@jack_w_lindsey) 's Twitter Profile Photo

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic!  We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀

📄 huggingface.co/papers/2507.18…
Grant Sanderson (@3blue1brown) 's Twitter Profile Photo

New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by Welch Labs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.

Tilde (@tilderesearch) 's Twitter Profile Photo

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, & Qwen3 ⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to: - 70%

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, &amp; Qwen3

⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to:

- 70%