main
@main_horse
Celebrating excellence
ID: 1605840745960591360
https://blog.main.horse 22-12-2022 08:20:58
4,4K Tweet
12,12K Takipçi
777 Takip Edilen
zhihu.com/question/19561… Why dpskv3.2 is exciting for both sparse attn and linear attn communities from Songlin Yang (Alert: this is in Chinese) the basic summary is: 1. after all, though swa and linear attn are popular, it is still hard to get rid of the full attn layer for