ethan (@ethanlabs__) Twitter Tweets • TwiCopy

ethan

@ethanlabs__

+ Follow

interested in ml research and engineering. Subtopics include multimodal interp, nn training dynamics, and optimization.

ID: 1952823645090529283

calendar_today05-08-2025 20:07:00

7 Tweet

4 Followers

33 Following

ethan

@ethanlabs__

8 months ago

I assumed this was the case for a while, that neurons being 'purposefully' superimposed with unrelated concepts as a result of n tokens >> n params training regime is the fundamental tradeoff of overtokened NNs/mechanism of outsized perf, but curious to hear other opinions.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

ethan

@ethanlabs__

6 months ago

I think there's an explanation for Keller Jordan's Muon performance from a mechint perspective, specifically, than the orthonormalized gradient updates reduce superposition/feature interference. If anyone's interested I can post more experiment details

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare