vLLM (@vllm_project) 's Twitter Profile
vLLM

@vllm_project

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

ID: 1774187564276289536

linkhttps://github.com/vllm-project/vllm calendar_today30-03-2024 21:31:01

327 Tweet

12,12K Followers

15 Following

vLLM (@vllm_project) 's Twitter Profile Photo

⚡Llama 3.1 series are uniquely challenging due to long context and large size. We want to thank Red Hat AI (formerly Neural Magic) for their continual stewardship of the quantization code path in vLLM, Anyscale for their high quality implementation of chunked prefill and speculative decoding,