Rana Shahout
@rana_shahout
Postdoc at Harvard | Computer Science
ID: 1448989776602873884
https://sites.google.com/view/ranash 15-10-2021 12:31:09
101 Tweet
209 Followers
368 Following
We are presenting “Prefix and output length-aware scheduling for efficient online LLM inference” at the ICLR 2025 (ICLR 2026) Sparsity in LLMs workshop (Sparsity in LLMs Workshop at ICLR 2025). 🪫 Challenge: LLM inference in data centers benefits from data parallelism. How can we exploit patterns in