Jeremy Bernstein (@jxbz) 's Twitter Profile
Jeremy Bernstein

@jxbz

☎️ silicon valley tech support @ modula.systems
✍️ anon feedback @ admonymous.co/jxbz

ID: 103996493

linkhttp://jeremybernste.in calendar_today11-01-2010 23:05:07

947 Tweet

4,4K Takipçi

561 Takip Edilen

TianyLin (@tianylin) 's Twitter Profile Photo

Announcing 𝐟𝐥𝐚𝐬𝐡-𝐦𝐮𝐨𝐧: a 🐍 pkg with customized CUDA kernel that aims to boost Muon optimizer: github.com/nil0x9/flash-m… 1/n

Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence

🚀 Hello, Kimi K2!  Open-Source Agentic Model!
🔹 1T total / 32B active MoE model
🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models
🔹Strong in coding and agentic tasks
🐤 Multimodal & thought-mode not supported for now

With Kimi K2, advanced agentic intelligence
Yuchen Jin (@yuchenj_uw) 's Twitter Profile Photo

Holy shit. Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike. Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are. So proud of the Moum team: Keller Jordan, Vlado Boza, You Jiacheng,

Holy shit.

Kimi K2 was pre-trained on 15.5T tokens using MuonClip with zero training spike.

Muon has officially scaled to the 1-trillion-parameter LLM level. Many doubted it could scale, but here we are.

So proud of the Moum team: <a href="/kellerjordan0/">Keller Jordan</a>, <a href="/bozavlado/">Vlado Boza</a>, <a href="/YouJiacheng/">You Jiacheng</a>,
Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default.
If anyone wants to take a crack at it... 
github.com/pytorch/pytorc…