ethan (@ethanlabs__) 's Twitter Profile
ethan

@ethanlabs__

interested in ml research and engineering. Subtopics include multimodal interp, nn training dynamics, and optimization.

ID: 1952823645090529283

calendar_today05-08-2025 20:07:00

7 Tweet

4 Followers

33 Following

ethan (@ethanlabs__) 's Twitter Profile Photo

I assumed this was the case for a while, that neurons being 'purposefully' superimposed with unrelated concepts as a result of n tokens >> n params training regime is the fundamental tradeoff of overtokened NNs/mechanism of outsized perf, but curious to hear other opinions.

ethan (@ethanlabs__) 's Twitter Profile Photo

I think there's an explanation for Keller Jordan's Muon performance from a mechint perspective, specifically, than the orthonormalized gradient updates reduce superposition/feature interference. If anyone's interested I can post more experiment details