@vllm_project : ⚡Llama 3.1 series are uniquely challenging due to long context and large size. We want to thank @neuralmagic for their continual stewardship of the quantization code path in vLLM, @anyscalecompute for their high quality implementation of chunked prefill and speculative decoding, • TwiCopy

vLLM

@vllm_project

+ Follow

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

ID: 1774187564276289536

linkhttps://github.com/vllm-project/vllm calendar_today30-03-2024 21:31:01

327 Tweet

12,12K Followers

15 Following

vLLM

@vllm_project

a year ago

⚡Llama 3.1 series are uniquely challenging due to long context and large size. We want to thank Red Hat AI (formerly Neural Magic) for their continual stewardship of the quantization code path in vLLM, Anyscale for their high quality implementation of chunked prefill and speculative decoding,

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare