
vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!
ID: 1774187564276289536
https://github.com/vllm-project/vllm 30-03-2024 21:31:01
327 Tweet
12,12K Followers
15 Following

⚡Llama 3.1 series are uniquely challenging due to long context and large size. We want to thank Red Hat AI (formerly Neural Magic) for their continual stewardship of the quantization code path in vLLM, Anyscale for their high quality implementation of chunked prefill and speculative decoding,