jianlin.su (@jianlin_s) 's Twitter Profile
jianlin.su

@jianlin_s

Grad is all you need @Kimi_Moonshot

ID: 1893313742374625280

linkhttps://jianlin.su calendar_today22-02-2025 14:55:56

13 Tweet

648 Takipçi

6 Takip Edilen

jianlin.su (@jianlin_s) 's Twitter Profile Photo

kexue.fm/archives/11196 This series opener explores the steepest-descent direction for equality-constrained optimization, starting with an SGD variant tailored to the hypersphere constraint.

jianlin.su (@jianlin_s) 's Twitter Profile Photo

A fun fact: Adam remains the dominant optimizer today, yet even it has had only scant opportunities to be verified on trillion-parameter models; Muon, proposed less than a year ago, has already trained at that scale.

jianlin.su (@jianlin_s) 's Twitter Profile Photo

Beyond MuP: 1. The Self-Cultivation of a Good Model kexue.fm/archives/11340 This series will share some top-down attempts at model optimization -- an extension and expansion of the ideas of MuP and Muon.