vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!
ID: 1774187564276289536
https://github.com/vllm-project/vllm 30-03-2024 21:31:01
327 Tweet
12,12K Followers
15 Following
π DeepSeek-OCR β the new frontier of OCR from DeepSeek , exploring optical context compression for LLMs, is running blazingly fast on vLLM β‘ (~2500 tokens/s on A100-40G) β powered by vllm==0.8.5 for day-0 model support. π§ Compresses visual contexts up to 20Γ while keeping