EmbeddedLLM (@embeddedllm) Twitter Tweets • TwiCopy

Red Hat AI

a month ago

BIG NEWS! 🎉 Compressed Tensors is officially joining the vLLM! Built on top of the excellent Hugging Face safetensors framework, Compressed Tensors extends it with efficient storage and management of compressed tensor data for model quantization and sparsity. Why

thumb_up_off_alt58

chat_bubble_outline2

repeat11

shareShare

vLLM

@vllm_project

a month ago

💡 vLLM @ Open Source AI Week！ 1⃣ Wednesday, Oct 23 & Thursday, Oct 24: vLLM @ Pytorch Conference 2025 🚀 Explore vLLM at PyTorch Conference 2025! 📅 Sessions to catch: 1. Easy, Fast, Cheap LLM Serving for Everyone – Simon Mo, Room 2004/2006 2. Open Source Post-Training Stack:

thumb_up_off_alt104

chat_bubble_outline0

repeat12

shareShare

vLLM

@vllm_project

a month ago

We are excited about an open ABI and FFI for ML Systems from Tianqi Chen. In our experience with vLLM, such interop layer is definitely needed!

thumb_up_off_alt80

chat_bubble_outline0

repeat10

shareShare

vLLM

@vllm_project

a month ago

kvcached works directly with vLLM and you can use it to serve multiple models on the same GPU. They will share unused KV cache blocks. Check it out!

thumb_up_off_alt553

chat_bubble_outline5

repeat71

shareShare

vLLM

@vllm_project

a month ago

🚀 Excited to share our work on batch-invariant inference in vLLM! Now you can get identical results regardless of batch size with just one flag: VLLM_BATCH_INVARIANT=1 No more subtle differences between bs=1 and bs=N (including prefill!). Let's dive into how we built this 🧵👇

thumb_up_off_alt276

chat_bubble_outline2

repeat43

shareShare

EmbeddedLLM

@embeddedllm

a month ago

Catch us in the special vLLM track at Anyscale Ray Summit 2025 SF Marriott Marquis vLLM Deep Dive: Architecture, Performance & Contributing by Tan TJian Can’t wait to see all vLLM friends. anyscale.com/ray-summit/202…

Catch us in the special vLLM track at <a href="/anyscalecompute/">Anyscale</a> Ray Summit 2025 SF Marriott Marquis
vLLM Deep Dive: Architecture, Performance & Contributing by <a href="/Rxday000/">Tan TJian</a>
Can’t wait to see all <a href="/vllm_project/">vLLM</a> friends. anyscale.com/ray-summit/202…

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

vLLM

@vllm_project

24 days ago

vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by EmbeddedLLM 👇 Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,

thumb_up_off_alt502

chat_bubble_outline10

repeat71

shareShare

vLLM

@vllm_project

23 days ago

🔥 Following our big announcement — here’s the full vLLM takeover at Ray Summit 2025! 📍 San Francisco • Nov 3–5 • Hosted by Anyscale Get ready for deep dives into high-performance inference, unified backends, prefix caching, MoE serving, and large-scale

thumb_up_off_alt53

chat_bubble_outline1

repeat9

shareShare

vLLM

@vllm_project

22 days ago

🎉 Congrats to Kimi.ai! vLLM Day-0 model expands! Now supporting Kimi Linear — hybrid linear attention with Kimi Delta Attention(KDA): - RULER 128k context: 84.3 perf + 3.98× speedup - Up to 6× faster decoding & 6.3× faster TPOT (1M tokens) - 75% KV cache reduction 💡

🎉 Congrats to <a href="/Kimi_Moonshot/">Kimi.ai</a>! vLLM Day-0 model expands! Now supporting Kimi Linear — hybrid linear attention with Kimi Delta Attention(KDA):

- RULER 128k context: 84.3 perf + 3.98× speedup
- Up to 6× faster decoding & 6.3× faster TPOT (1M tokens)
- 75% KV cache reduction

💡

thumb_up_off_alt255

chat_bubble_outline8

repeat32

shareShare

Roger Wang

@rogerw0108

19 days ago

There are actually quite a few optimizations specific to multimodal other than just faster kernels, and I didn’t pull those all nighters for nothing🥲

thumb_up_off_alt60

chat_bubble_outline4

repeat3

shareShare

vLLM

@vllm_project

18 days ago

Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat49

shareShare

vLLM

@vllm_project

17 days ago

Wow Quantization-enhanced Reinforcement Learning using vLLM! Great job by Yukang Chen 😃

thumb_up_off_alt241

chat_bubble_outline5

repeat32

shareShare

vLLM

@vllm_project

17 days ago

🔥Highly requested by the community, PaddleOCR-VL is now officially supported on vLLM! 🚀 Check out our recipe for this model to get started!👇docs.vllm.ai/projects/recip…

thumb_up_off_alt106

chat_bubble_outline8

repeat10

shareShare

vLLM

@vllm_project

17 days ago

Amazing work by Rui-Jie (Ridger) Zhu and the ByteDance Seed team — Scaling Latent Reasoning via Looped LMs introduces looped reasoning as a new scaling dimension. 🔥 The Ouro model is now runnable on vLLM (nightly version) — bringing efficient inference to this new paradigm of latent

thumb_up_off_alt234

chat_bubble_outline3

repeat36

shareShare

Kuntai Du

@this_will_echo

15 days ago

Happy to meet with EmbeddedLLM people in person! Thanks for all the hardware supports in vLLM and LMCache Lab !

Happy to meet with <a href="/EmbeddedLLM/">EmbeddedLLM</a> people in person! Thanks for all the hardware supports in <a href="/vllm_project/">vLLM</a> and <a href="/lmcache/">LMCache Lab</a> !

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

EmbeddedLLM

@embeddedllm

14 days ago

Awesome connecting with the vLLM community in person! Roger Wang Kuntai Du Harry Mellor Simon Mo Tan TJian Chendi Xue Robert Shaw Brittany Rockwell

Awesome connecting with the <a href="/vllm_project/">vLLM</a> community in person! <a href="/rogerw0108/">Roger Wang</a> <a href="/this_will_echo/">Kuntai Du</a> <a href="/hmellor_/">Harry Mellor</a> <a href="/simon_mo_/">Simon Mo</a> <a href="/Rxday000/">Tan TJian</a> <a href="/chendi_xue/">Chendi Xue</a> <a href="/robertshaw21/">Robert Shaw</a> <a href="/rockwell29139/">Brittany Rockwell</a>

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

vLLM

@vllm_project

9 days ago

สวัสดีครับ Sawadekap, Bangkok! พร้อมจะโกลว์กันหรือยัง? ✨ vLLM Meetup — 21 Nov 2025 Hosted by EmbeddedLLM, AMD & Red Hat Members from the vLLM maintainer team will join us to share their latest insights and roadmap — straight from the source! We've also invited local Thai

thumb_up_off_alt24

chat_bubble_outline0

repeat4

shareShare

vLLM

@vllm_project

9 days ago

Thanks to GitHub for spotlighting vLLM in the Octoverse 2025 report — one of the fastest-growing open-source AI projects this year. 🏆 Top OSS by contributors 🚀 Fastest-growing by contributors 🌱 Attracting the most first-time contributors Trusted by leading open model

Thanks to <a href="/github/">GitHub</a> for spotlighting vLLM in the Octoverse 2025 report — one of the fastest-growing open-source AI projects this year.

🏆 Top OSS by contributors
🚀 Fastest-growing by contributors
🌱 Attracting the most first-time contributors

Trusted by leading open model

thumb_up_off_alt117

chat_bubble_outline7

repeat23

shareShare

EmbeddedLLM

@embeddedllm

8 days ago

Big night at the vLLM × Meta × AMD meetup in Palo Alto 💥 So fun hanging out IRL with fellow vLLM Woosuk Kwon, Simon Mo and the AMD crew Anush Elangovan and Ramine Roane. Bonus: heading home with a signed AMD Radeon PRO AI Pro R9700 to squeeze even more tokens/sec out of AMD

Big night at the vLLM × Meta × AMD meetup in Palo Alto 💥
So fun hanging out IRL with fellow <a href="/vllm_project/">vLLM</a> <a href="/woosuk_k/">Woosuk Kwon</a>, <a href="/simon_mo_/">Simon Mo</a> and the <a href="/AMD/">AMD</a> crew <a href="/AnushElangovan/">Anush Elangovan</a> and <a href="/roaner/">Ramine Roane</a>.

Bonus: heading home with a signed <a href="/RadeonPRO/">AMD Radeon PRO</a> AI Pro R9700 to squeeze even more tokens/sec out of AMD

thumb_up_off_alt27

chat_bubble_outline0

repeat5

shareShare