Simon Mo (@simon_mo_) 's Twitter Profile
Simon Mo

@simon_mo_

@vllm_project

ID: 1016136948845502465

calendar_today09-07-2018 01:48:22

110 Tweet

1,1K Followers

342 Following

vLLM (@vllm_project) 's Twitter Profile Photo

⬆️pip install -U vLLM You can now run DeepSeek-V3 on latest vLLM many different ways: 💰 Tensor parallelism on 8xH200 or MI300x, or TP16 on IB connected nodes: `--tensor-parallel-size` 🌐 Pipeline parallelism (!) across two 8xH100 or any collection of machines without high speed

NovaSky (@novaskyai) 's Twitter Profile Photo

1/6 🚀 Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450! 📊Blog: novasky-ai.github.io/posts/sky-t1/ 🏋️‍♀️Model weights: huggingface.co/NovaSky-AI/Sky…

1/6 🚀 
Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450! 

📊Blog: novasky-ai.github.io/posts/sky-t1/
🏋️‍♀️Model weights: huggingface.co/NovaSky-AI/Sky…
Simon Mo (@simon_mo_) 's Twitter Profile Photo

Our biggest milestone yet! I'm particularly excited how the vLLM contributor community organized from many organization to deliver a high quality V1 engine core. We are just getting started 🚀

Costa Huang (@vwxyzjn) 's Twitter Profile Photo

Finally, I want to give a special thanks to the vLLM team (Kaichao You Woosuk Kwon Simon Mo Zhuohan Li) for their invaluable support in debugging NCCL weight transfer issues. They made our 70 RLVR weight transfer 45x faster and 405B RLVR even possible! See

Robert Shaw (@robertshaw21) 's Twitter Profile Photo

Landed my first PR in vLLM 1 year ago today (github.com/vllm-project/v…) 38K LOC and 100+ PRs later and we are just getting started

vLLM (@vllm_project) 's Twitter Profile Photo

We landed the 1st batch of enhancements to the DeepSeek models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

We landed the 1st batch of enhancements to the <a href="/deepseek_ai/">DeepSeek</a> models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.
Roger Wang (@rogerw0108) 's Twitter Profile Photo

Robert and I started contributing to vLLM around the same time and today is my turn. Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members! and we're just getting started github.com/vllm-project/v…

Simon Mo (@simon_mo_) 's Twitter Profile Photo

Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

vLLM (@vllm_project) 's Twitter Profile Photo

🙏 DeepSeek's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

Simon Mo (@simon_mo_) 's Twitter Profile Photo

😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

OpenAI Developers (@openaidevs) 's Twitter Profile Photo

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine vLLM ⬩OWASP Nettacker - automated network pentesting OWASP Nettacker ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines Dagster

Simon Mo (@simon_mo_) 's Twitter Profile Photo

I didn't expect the first section "KV-cache hit rate is the single most important metric for a production-stage AI agent" but 🤯