
Yikang Shen
@yikang_shen
MTS @xAI. ex Staff RS @IBM. PhD @Mila. Granite LMs, Ordered Neurons, Mixture of Attention Heads, JetMoE, stick-breaking attention and Power LR.
ID: 804168614
05-09-2012 08:43:26
220 Tweet
2,2K Followers
370 Following



Is RoPE necessary? Can sigmoid attention outperform softmax? Can we design PE for seamless length extrapolation? Join ASAP seminar 03—Shawn Tan & Yikang Shen present Stick-Breaking Attention (openreview.net/forum?id=r8J3D…), a new RoPE-free sigmoid attention with strong extrapolation!





