@simran_s_arora : There’s been tons of work on KV-cache compression and KV-cache free Transformer-alternatives (SSMs, linear attention) models for long-context, but we know there’s no free lunch with these methods. The quality-memory tradeoffs are annoying. *Is all lost?* Introducing CARTRIDGES: • TwiCopy

Simran Arora

@simran_s_arora

+ Follow

cs @StanfordAILab @hazyresearch

ID: 4712264894

linkhttps://arorasimran.com/ calendar_today05-01-2016 06:18:44

311 Tweet

3,3K Takipçi

193 Takip Edilen

Simran Arora

@simran_s_arora

6 months ago

There’s been tons of work on KV-cache compression and KV-cache free Transformer-alternatives (SSMs, linear attention) models for long-context, but we know there’s no free lunch with these methods. The quality-memory tradeoffs are annoying. *Is all lost?* Introducing CARTRIDGES:

thumb_up_off_alt297

chat_bubble_outline5

repeat32

shareShare