ethan
@ethanlabs__
interested in ml research and engineering. Subtopics include multimodal interp, nn training dynamics, and optimization.
ID: 1952823645090529283
05-08-2025 20:07:00
7 Tweet
4 Followers
33 Following
I think there's an explanation for Keller Jordan's Muon performance from a mechint perspective, specifically, than the orthonormalized gradient updates reduce superposition/feature interference. If anyone's interested I can post more experiment details