vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!
ID: 1774187564276289536
https://github.com/vllm-project/vllm 30-03-2024 21:31:01
327 Tweet
12,12K Followers
15 Following
🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what