vLLM (@vllm_project) Twitter Tweets • TwiCopy

vLLM

@vllm_project

+ Follow

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

ID: 1774187564276289536

linkhttps://github.com/vllm-project/vllm calendar_today30-03-2024 21:31:01

327 Tweet

12,12K Followers

15 Following

ray

@raydistributed

6 months ago

🚨Meetup Alert🚨 Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from Pinterest + Anyscale We'll discuss: ⚡ 300x throughput w/ Ray at Pinterest ⚡ DeepSeek + vLLM prod deployment ⚡ Ray Serve + Data LLM preview 📍 Anyscale HQ |

🚨Meetup Alert🚨

Scaling LLM inference? Join us June 10 in SF for a Ray Meetup with real-world wins from <a href="/Pinterest/">Pinterest</a> + <a href="/anyscalecompute/">Anyscale</a>

We'll discuss:
⚡ 300x throughput w/ Ray at Pinterest
⚡ DeepSeek + vLLM prod deployment
⚡ Ray Serve + Data LLM preview

📍 Anyscale HQ |

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Aurick Qiao

@aurickq

6 months ago

Excited to open-source Shift Parallelism, developed at Snowflake AI Research for LLM inference! With it, Arctic Inference + vLLM delivers: 🚀3.4x faster e2e latency & 1.06x higher throughput 🚀1.7x faster generation & 2.25x lower response time 🚀16x higher throughput

Excited to open-source Shift Parallelism, developed at <a href="/Snowflake/">Snowflake</a> AI Research for LLM inference!

With it, Arctic Inference + <a href="/vllm_project/">vLLM</a> delivers:

🚀3.4x faster e2e latency & 1.06x higher throughput
🚀1.7x faster generation & 2.25x lower response time
🚀16x higher throughput

thumb_up_off_alt165

chat_bubble_outline2

repeat39

shareShare

EmbeddedLLM

@embeddedllm

6 months ago

vLLM 0.9.0 is HERE, unleashing HUGE performance on AMD GPUs! MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1! Consumer: 📈 RX 9000 Series: +16.4% throughput 🚀 RX 7000 Series: +19.0% performance gains AI at AMD AMD Radeon

vLLM 0.9.0 is HERE, unleashing HUGE performance on <a href="/AMD/">AMD</a> GPUs!
MI-Series: FP8 KV cache, +19.4% with AITER GEMM, +16.8% Qwen3 MoE (MI300X), +13.8% DeepSeek V3/R1!
Consumer: 📈 RX 9000 Series: +16.4% throughput
🚀 RX 7000 Series: +19.0% performance gains
<a href="/AIatAMD/">AI at AMD</a> <a href="/AMDRadeon/">AMD Radeon</a>

thumb_up_off_alt78

chat_bubble_outline4

repeat22

shareShare

vLLM

@vllm_project

6 months ago

🚀 Join us at the SF AIBrix & vLLM Meetup on June 18th at AWS SF GenAI Loft! Learn from experts at ByteDance, AWS Neuron, and EKS. Discover AIBrix: a scalable, cost-effective control plane for vLLM. Talks, Q&A, pizza, and networking! 🍕🤝 lu.ma/ab2id296

thumb_up_off_alt46

chat_bubble_outline1

repeat10

shareShare

Red Hat AI

@redhat_ai

6 months ago

.vLLM v0.9.0 was a BIG release 🎉 📝 649 commits 👥 143 contributors 👏 82 first-time contributors Huge thanks to everyone who made it happen! Michael Goin breaks down what’s new in vLLM v0.9.0 ⬇️

thumb_up_off_alt73

chat_bubble_outline3

repeat7

shareShare

Red Hat AI

@redhat_ai

6 months ago

🇯🇵 Join us for an in-person vLLM meetup on Monday, June 16 in Tokyo. Or tune in via live stream! Agenda: -Intro to vLLM -Japanese LLM adoption -Model optimization w/ LLM Compressor -Distributed inference w/ llm-d -Q&A and lightning talks RSVP: ossbyredhat.connpass.com/event/357695/

🇯🇵 Join us for an in-person <a href="/vllm_project/">vLLM</a> meetup on Monday, June 16 in Tokyo. Or tune in via live stream!

Agenda:
-Intro to vLLM
-Japanese LLM adoption
-Model optimization w/ LLM Compressor
-Distributed inference w/ llm-d
-Q&A and lightning talks

RSVP: ossbyredhat.connpass.com/event/357695/

thumb_up_off_alt47

chat_bubble_outline2

repeat18

shareShare

vLLM

@vllm_project

6 months ago

Congrats on the launch! vLLM is proud to support the new Qwen3 embedding models, check it out 👉🏻 github.com/QwenLM/Qwen3-E…

thumb_up_off_alt180

chat_bubble_outline3

repeat23

shareShare

vLLM

@vllm_project

6 months ago

Thanks for the great investigation! vLLM values usability, performance, and building the ecosystem for LLM inference, together let's make open-source better❤️ Stay tuned for latest updates from vLLM!

thumb_up_off_alt105

chat_bubble_outline4

repeat8

shareShare

ray

@raydistributed

6 months ago

Our next Ray Meetup is almost here! June 10th 👇 Hear how Pinterest scaled inference 300x with Ray, watch a DeepSeek + vLLM live demo, and get a first look at new Ray Serve tools. Plus: networking, snacks, and great convos with fellow builders. Save you seat:

Our next Ray Meetup is almost here! June 10th 👇

Hear how <a href="/Pinterest/">Pinterest</a> scaled inference 300x with Ray, watch a DeepSeek + <a href="/vllm_project/">vLLM</a> live demo, and get a first look at new Ray Serve tools.

Plus: networking, snacks, and great convos with fellow builders.

Save you seat:

thumb_up_off_alt16

chat_bubble_outline0

repeat5

shareShare

vLLM

@vllm_project

6 months ago

⬆️ uv pip install -U vllm --extra-index-url wheels.vllm.ai/0.9.1rc1 --torch-backend=auto Try out Magistral on with vLLM 0.9.1rc1 today! 🔮

thumb_up_off_alt151

chat_bubble_outline1

repeat22

shareShare

vLLM

@vllm_project

6 months ago

👀 Look what just arrived at UC Berkeley Sky! 🌟 A shiny MI355X system. Huge thanks to AMD for supporting open source and we are looking forward to getting it set up in the next few days!

👀 Look what just arrived at <a href="/BerkeleySky/">UC Berkeley Sky</a>! 🌟 A shiny MI355X system. Huge thanks to <a href="/AMD/">AMD</a> for supporting open source and we are looking forward to getting it set up in the next few days!

thumb_up_off_alt273

chat_bubble_outline3

repeat24

shareShare

Anush Elangovan

@anushelangovan

6 months ago

Glad to support the UC Berkeley Sky and the vLLM community. Day-0 Support means you get hardware on Day -2 😀. Looking forward to what the community builds and accelerating AI adoption.

thumb_up_off_alt148

chat_bubble_outline5

repeat14

shareShare

Robert Nishihara

@robertnishihara

6 months ago

This table was a footnote at the end of the blog, but it's actually one of the most interesting points. There is an emerging stack for post-training. anyscale.com/blog/ai-comput…

thumb_up_off_alt31

chat_bubble_outline1

repeat7

shareShare

vLLM

@vllm_project

6 months ago

Thank you AMD Lisa Su Anush Elangovan for Advancing AI together with vLLM! We look forward to the continued partnership and pushing the boundary of inference.