Cody Yu
@codyhaoyu
MLSys, LLM Serving, Deep Learning Compiler
ID: 836647857490804736
https://www.linkedin.com/in/cody-hao-yu 28-02-2017 18:42:48
106 Tweet
176 Followers
27 Following
We've great projects at Anyscale, come work with us. We've shipped: • Chunked prefill Sang Cho • Multi-LoRA Antoni Baum • Dynamic spec decode Lily Liu • FP8 Cody Yu • MoE optimization Philipp Moritz • Ray, dist. compute framework used to train ChatGPT Robert Nishihara et al
A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…