Yikang Shen (@yikang_shen) 's Twitter Profile
Yikang Shen

@yikang_shen

MTS @xAI. ex Staff RS @IBM. PhD @Mila. Granite LMs, Ordered Neurons, Mixture of Attention Heads, JetMoE, stick-breaking attention and Power LR.

ID: 804168614

calendar_today05-09-2012 08:43:26

220 Tweet

2,2K Followers

370 Following

Yikang Shen (@yikang_shen) 's Twitter Profile Photo

It's good to see Deepseek v3 draw everyone's attention to reducing the training cost of LLM. Over the last two years, we found that you can drastically reduce the cost of LLM in every step of its training, including 1) hyper-parameter search/scaling law experiments, 2) model

It's good to see Deepseek v3 draw everyone's attention to reducing the training cost of LLM. 
Over the last two years, we found that you can drastically reduce the cost of LLM in every step of its training, including 1) hyper-parameter search/scaling law experiments, 2) model