We are excite to announce FlashInfer v0.2!
Core contributions of this release include:
- Block/Vector Sparse (Paged) Attention on FlashAttention-3
- JIT compilation for customized attention variants
- Fused Multi-head Latent Attention (MLA) decoding kernel
- Lots of bugfix and