Yushun Zhang
@ericzhang0410
Phd student at The Chinese University of Hong Kong, shenzhen, China,
Working on optimization and LLMs zyushun.github.io
ID: 1239780017040580610
17-03-2020 05:06:12
326 Tweet
279 Takipçi
357 Takip Edilen
Check out this excellent work led by Dmitry Dmitry Rybin ! We discovered a new algorithm to compute the matrix product XX^t with 5% fewer number of multiplications
Holy shit. Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike. Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are. So proud of the Moum team: Keller Jordan, Vlado Boza, You Jiacheng,
Awesome! Kaiyue Wen this is related to our discussion before.