@xieenze_jr : 🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 Key Features🌟 - Block-Wise KV Cache Reuses 950%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1×–27.6× throughput gains with <2% accuracy loss 🔄

Enze Xie

@xieenze_jr

+ Follow

Sr. Research Scientist at NVIDIA, doing GenAI, CS PhD from HKU MMLab, interned at NVIDIA.

ID: 1723702194380427264

linkhttps://xieenze.github.io/ calendar_today12-11-2023 14:00:10

49 Tweet

769 Followers

116 Following

Enze Xie

@xieenze_jr

2 months ago

🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 Key Features🌟 - Block-Wise KV Cache Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1×–27.6× throughput gains with <2% accuracy loss 🔄 -

thumb_up_off_alt174

chat_bubble_outline8

repeat34

shareShare