@yikang_shen : It's good to see Deepseek v3 draw everyone's attention to reducing the training cost of LLM. Over the last two years, we found that you can drastically reduce the cost of LLM in every step of its training, including 1) hyper-parameter search/scaling law experiments, 2) model • TwiCopy

Yikang Shen

@yikang_shen

+ Follow

MTS @xAI. ex Staff RS @IBM. PhD @Mila. Granite LMs, Ordered Neurons, Mixture of Attention Heads, JetMoE, stick-breaking attention and Power LR.

ID: 804168614

calendar_today05-09-2012 08:43:26

220 Tweet

2,2K Followers

370 Following

Yikang Shen

@yikang_shen

7 months ago

It's good to see Deepseek v3 draw everyone's attention to reducing the training cost of LLM. Over the last two years, we found that you can drastically reduce the cost of LLM in every step of its training, including 1) hyper-parameter search/scaling law experiments, 2) model

thumb_up_off_alt241

chat_bubble_outline2

repeat52

shareShare