vLLM (@vllm_project) 's Twitter Profile
vLLM

@vllm_project

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

ID: 1774187564276289536

linkhttps://github.com/vllm-project/vllm calendar_today30-03-2024 21:31:01

327 Tweet

12,12K Followers

15 Following

ray (@raydistributed) 's Twitter Profile Photo

๐ŸšจMeetup Alert๐Ÿšจ Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from Pinterest + Anyscale We'll discuss: โšก 300x throughput w/ Ray at Pinterest โšก DeepSeek + vLLM prod deployment โšก Ray Serve + Data LLM preview ๐Ÿ“ Anyscale HQ |

๐ŸšจMeetup Alert๐Ÿšจ 

Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from <a href="/Pinterest/">Pinterest</a> + <a href="/anyscalecompute/">Anyscale</a> 

We'll discuss: 
โšก 300x throughput w/ Ray at Pinterest
โšก DeepSeek + vLLM prod deployment
โšก Ray Serve + Data LLM preview

๐Ÿ“ Anyscale HQ |
Aurick Qiao (@aurickq) 's Twitter Profile Photo

Excited to open-source Shift Parallelism, developed at Snowflake AI Research for LLM inference! With it, Arctic Inference + vLLM delivers: ๐Ÿš€3.4x faster e2e latency & 1.06x higher throughput ๐Ÿš€1.7x faster generation & 2.25x lower response time ๐Ÿš€16x higher throughput

Excited to open-source Shift Parallelism, developed at <a href="/Snowflake/">Snowflake</a> AI Research for LLM inference!

With it, Arctic Inference + <a href="/vllm_project/">vLLM</a> delivers:

๐Ÿš€3.4x faster e2e latency &amp; 1.06x higher throughput
๐Ÿš€1.7x faster generation &amp; 2.25x lower response time
๐Ÿš€16x higher throughput
EmbeddedLLM (@embeddedllm) 's Twitter Profile Photo

vLLM 0.9.0 is HERE, unleashing HUGE performance on AMD GPUs! MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1! Consumer: ๐Ÿ“ˆ RX 9000 Series: +16.4% throughput ๐Ÿš€ RX 7000 Series: +19.0% performance gains AI at AMD AMD Radeon

vLLM 0.9.0 is HERE, unleashing HUGE performance on <a href="/AMD/">AMD</a> GPUs! 
MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1! 
Consumer: ๐Ÿ“ˆ RX 9000 Series: +16.4% throughput
๐Ÿš€ RX 7000 Series: +19.0% performance gains
<a href="/AIatAMD/">AI at AMD</a> <a href="/AMDRadeon/">AMD Radeon</a>
vLLM (@vllm_project) 's Twitter Profile Photo

๐Ÿš€ Join us at the SF AIBrix & vLLM Meetup on June 18th at AWS SF GenAI Loft! Learn from experts at ByteDance, AWS Neuron, and EKS. Discover AIBrix: a scalable, cost-effective control plane for vLLM. Talks, Q&A, pizza, and networking! ๐Ÿ•๐Ÿค lu.ma/ab2id296

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

.vLLM v0.9.0 was a BIG release ๐ŸŽ‰ ๐Ÿ“ 649 commits ๐Ÿ‘ฅ 143 contributors ๐Ÿ‘ 82 first-time contributors Huge thanks to everyone who made it happen! Michael Goin breaks down whatโ€™s new in vLLM v0.9.0 โฌ‡๏ธ

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

๐Ÿ‡ฏ๐Ÿ‡ต Join us for an in-person vLLM meetup on Monday, June 16 in Tokyo. Or tune in via live stream! Agenda: -Intro to vLLM -Japanese LLM adoption -Model optimization w/ LLM Compressor -Distributed inference w/ llm-d -Q&A and lightning talks RSVP: ossbyredhat.connpass.com/event/357695/

๐Ÿ‡ฏ๐Ÿ‡ต Join us for an in-person <a href="/vllm_project/">vLLM</a> meetup on Monday, June 16 in Tokyo. Or tune in via live stream!

Agenda:
-Intro to vLLM
-Japanese LLM adoption
-Model optimization w/ LLM Compressor
-Distributed inference w/ llm-d
-Q&amp;A and lightning talks

RSVP: ossbyredhat.connpass.com/event/357695/
vLLM (@vllm_project) 's Twitter Profile Photo

Congrats on the launch! vLLM is proud to support the new Qwen3 embedding models, check it out ๐Ÿ‘‰๐Ÿป github.com/QwenLM/Qwen3-Eโ€ฆ

vLLM (@vllm_project) 's Twitter Profile Photo

Thanks for the great investigation! vLLM values usability, performance, and building the ecosystem for LLM inference, together let's make open-source betterโค๏ธ Stay tuned for latest updates from vLLM!

ray (@raydistributed) 's Twitter Profile Photo

Our next Ray Meetup is almost here! June 10th ๐Ÿ‘‡ Hear how Pinterest scaled inference 300x with Ray, watch a DeepSeek + vLLM live demo, and get a first look at new Ray Serve tools. Plus: networking, snacks, and great convos with fellow builders. Save you seat:

Our next Ray Meetup is almost here! June 10th ๐Ÿ‘‡

Hear how <a href="/Pinterest/">Pinterest</a>  scaled inference 300x with Ray, watch a DeepSeek + <a href="/vllm_project/">vLLM</a> live demo, and get a first look at new Ray Serve tools.

Plus: networking, snacks, and great convos with fellow builders.

Save you seat:
vLLM (@vllm_project) 's Twitter Profile Photo

โฌ†๏ธ uv pip install -U vllm --extra-index-url wheels.vllm.ai/0.9.1rc1 --torch-backend=auto Try out Magistral on with vLLM 0.9.1rc1 today! ๐Ÿ”ฎ

vLLM (@vllm_project) 's Twitter Profile Photo

๐Ÿ‘€ Look what just arrived at UC Berkeley Sky! ๐ŸŒŸ A shiny MI355X system. Huge thanks to AMD for supporting open source and we are looking forward to getting it set up in the next few days!

๐Ÿ‘€ Look what just arrived at <a href="/BerkeleySky/">UC Berkeley Sky</a>! ๐ŸŒŸ A shiny MI355X system. Huge thanks to <a href="/AMD/">AMD</a> for supporting open source and we are looking forward to getting it set up in the next few days!
Anush Elangovan (@anushelangovan) 's Twitter Profile Photo

Glad to support the UC Berkeley Sky and the vLLM community. Day-0 Support means you get hardware on Day -2 ๐Ÿ˜€. Looking forward to what the community builds and accelerating AI adoption.

Robert Nishihara (@robertnishihara) 's Twitter Profile Photo

This table was a footnote at the end of the blog, but it's actually one of the most interesting points. There is an emerging stack for post-training. anyscale.com/blog/ai-computโ€ฆ

This table was a footnote at the end of the blog, but it's actually one of the most interesting points. There is an emerging stack for post-training.

anyscale.com/blog/ai-computโ€ฆ
vLLM (@vllm_project) 's Twitter Profile Photo

Cool to see vLLM used as part of WhatsApp Trusted Execution Environment (TEE) Private Processing! Paper here: ai.meta.com/static-resourcโ€ฆ

Cool to see <a href="/vllm_project/">vLLM</a> used as part of <a href="/WhatsApp/">WhatsApp</a> Trusted Execution Environment (TEE) Private Processing! 

Paper here: ai.meta.com/static-resourcโ€ฆ