
Evan Miller
@evmill
Statistically inclined software developer, occasional blogger about math + stats stuff. Working on evals @AnthropicAI
ID: 42923034
https://www.evanmiller.org/ 27-05-2009 16:55:16
1,1K Tweet
5,5K Followers
197 Following

I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution… Why LLM designers should stop using Softmax 👇 evanmiller.org/attention-is-o…