Zheng Zhan
@zhengzhan13
Researcher @MSFTResearch
ID: 1498189775005749248
https://zhanzheng8585.github.io/ 28-02-2022 06:54:28
28 Tweet
30 Followers
62 Following
Is RoPE necessary? Can sigmoid attention outperform softmax? Can we design PE for seamless length extrapolation? Join ASAP seminar 03—Shawn Tan & Yikang Shen present Stick-Breaking Attention (openreview.net/forum?id=r8J3D…), a new RoPE-free sigmoid attention with strong extrapolation!
The Belief State Transformer edwardshu.com/bst-website/ is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: microsoft.com/en-us/research… and mgostIH for further discussion.
Xinyu Yang will be presenting this amazing work at ASAP seminar tomorrow! Do not miss his talk