Mr. Agent
@agenticai
Creator of new things.
ID: 1794485331883892736
25-05-2024 21:47:12
84 Tweet
66 Takipçi
270 Takip Edilen
Do you know your LLM uses less than 1% of your GPU at inference? Too much time is wasted on KV cache memory access ➡️ We tackle this with the 🎁 Block Transformer: a global-to-local architecture that speeds up decoding up to 20x 🚀 KAIST AI LG AI Research w/ Google DeepMind 🧵