Huiqiang Jiang (@iofu728) 's Twitter Profile
Huiqiang Jiang

@iofu728

RSDE @MSFTResearch Shanghai

ID: 1156745200770768896

linkhttps://hqjiang.com/ calendar_today01-08-2019 01:55:21

107 Tweet

248 Takipçi

547 Takip Edilen

elvis (@omarsar0) 's Twitter Profile Photo

A KV Cache-Centric Analysis of Long-Context Methods Evaluates long-context methods from a KV cache-centric perspective: 1) KV cache generation, 2) KV cache compression, 3) KV cache retrieval, 4) KV cache loading. The paper reports some interesting findings. For instance, they

A KV Cache-Centric Analysis of Long-Context Methods

Evaluates long-context methods from a KV cache-centric perspective: 1) KV cache generation, 2) KV cache compression, 3) KV cache retrieval, 4) KV cache loading.

The paper reports some interesting findings. For instance, they
Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

SCBench has been accepted by #ICLR2025! Now you can evaluate your long-context methods across the full KV cache lifecycle. Congratulations to Yucheng Li and all my co-authors! Find more details at aka.ms/SCBench

Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

Great work!🚀 Exciting to see MInference being deployed on servers, and also open-source the integration of chunk pre-filling, dynamic sparsity, and DCA, along with the vLLM implement! Thank you for your incredible work🥁

Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

🔥 Excellent work on dynamic sparse attention, especially its performance improvement in long CoT! Looking forward to your next release!

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Microsoft presents MMInference - Accelerates pre-filling for long-context VLMs via modality-aware permutation - Accelerates 8.3x at 1M tokens while maintaining accuracy

Microsoft presents MMInference

- Accelerates pre-filling for long-context VLMs via modality-aware permutation
- Accelerates 8.3x at 1M tokens while maintaining accuracy
Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

✈️to ICLR'25. Looking forward to meeting you all and discussing efficient LLMs. #ICLR25 - (Apr. 24 10:00 # 291) SCBench aka.ms/SCBench - (Apr. 25 13:30-14 Microsoft Booth) Efficient Long-context Methods - (Apr. 26 15:00-17:30 #58) SeCom aka.ms/SeCom

✈️to ICLR'25. Looking forward to meeting you all and discussing efficient LLMs. #ICLR25   

- (Apr. 24 10:00 # 291) SCBench aka.ms/SCBench
- (Apr. 25 13:30-14 Microsoft Booth) Efficient Long-context Methods 
- (Apr. 26 15:00-17:30 #58) SeCom aka.ms/SeCom
Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

Thanks Aran Komatsuzaki for the promotion! It, a bottom-up system-algorithm co-design sparse attention methods, can process 1M tokens video 8.3x faster using Long-context VLMs. And we'll present it at ICLR'25 Microsoft Booth (Apr. 25 13:30) aka.ms/MMInference

Piotr Nawrot (@p_nawrot) 's Twitter Profile Photo

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs. We performed the most comprehensive study on training-free sparse attention to date. Here is what we found:

Sparse attention is one of the most promising strategies to unlock long-context processing and long generation reasoning in LLMs.

We performed the most comprehensive study on training-free sparse attention to date.

Here is what we found:
Hanshi Sun (@preminstrel) 's Twitter Profile Photo

🎉 Thrilled to announce our ShadowKV has been accepted to #ICML2025 as a ✨Spotlight Presentation❗️ ❓Facing challenges with high-throughput long-context LLM serving? ShadowKV is here to help! 🚀 Achieves memory-efficient & high-throughput inference via sparse attention. 🌟

🎉 Thrilled to announce our ShadowKV has been accepted to #ICML2025 as a ✨Spotlight Presentation❗️

❓Facing challenges with high-throughput long-context LLM serving? ShadowKV is here to help! 

🚀 Achieves memory-efficient & high-throughput inference via sparse attention.
🌟
Huiqiang Jiang (@iofu728) 's Twitter Profile Photo

MMInference is accepted by #ICML2025! It use permutation to solve inductive bias and modality boundary issues in multi-modality. And also unify dynamic sparse attention in sparse load + dense tensor core pipeline. Congratulations to Yucheng Li! Find more aka.ms/MMInference

Cognition (@cognition_labs) 's Twitter Profile Photo

Our research interns present: Kevin-32B = K(ernel D)evin It's the first open model trained using RL for writing CUDA kernels. We implemented multi-turn RL using GRPO (based on QwQ-32B) on the KernelBench dataset. It outperforms top reasoning models (o3 & o4-mini)! 🧵

Our research interns present:
Kevin-32B = K(ernel D)evin

It's the first open model trained using RL for writing CUDA kernels. We implemented multi-turn RL using GRPO (based on QwQ-32B) on the KernelBench dataset.

It outperforms top reasoning models (o3 & o4-mini)! 🧵