Simon Mo (@simon_mo_) Twitter Tweets • TwiCopy

vLLM

a year ago

⬆️pip install -U vLLM You can now run DeepSeek-V3 on latest vLLM many different ways: 💰 Tensor parallelism on 8xH200 or MI300x, or TP16 on IB connected nodes: `--tensor-parallel-size` 🌐 Pipeline parallelism (!) across two 8xH100 or any collection of machines without high speed

thumb_up_off_alt303

chat_bubble_outline8

repeat50

shareShare

NovaSky

@novaskyai

10 months ago

1/6 🚀 Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450! 📊Blog: novasky-ai.github.io/posts/sky-t1/ 🏋️‍♀️Model weights: huggingface.co/NovaSky-AI/Sky…

thumb_up_off_alt1,1K

chat_bubble_outline126

repeat246

shareShare

Simon Mo

@simon_mo_

10 months ago

Our biggest milestone yet! I'm particularly excited how the vLLM contributor community organized from many organization to deliver a high quality V1 engine core. We are just getting started 🚀

thumb_up_off_alt22

chat_bubble_outline2

repeat0

shareShare

Costa Huang

@vwxyzjn

10 months ago

Finally, I want to give a special thanks to the vLLM team (Kaichao You Woosuk Kwon Simon Mo Zhuohan Li) for their invaluable support in debugging NCCL weight transfer issues. They made our 70 RLVR weight transfer 45x faster and 405B RLVR even possible! See

thumb_up_off_alt28

chat_bubble_outline0

repeat6

shareShare

Robert Shaw

@robertshaw21

10 months ago

Landed my first PR in vLLM 1 year ago today (github.com/vllm-project/v…) 38K LOC and 100+ PRs later and we are just getting started

thumb_up_off_alt34

chat_bubble_outline0

repeat3

shareShare

vLLM

@vllm_project

10 months ago

We landed the 1st batch of enhancements to the DeepSeek models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

We landed the 1st batch of enhancements to the <a href="/deepseek_ai/">DeepSeek</a> models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

thumb_up_off_alt741

chat_bubble_outline50

repeat103

shareShare

Character.AI

@character_ai

9 months ago

it's Catacter AI now 😼

thumb_up_off_alt391

chat_bubble_outline19

repeat13

shareShare

Roger Wang

@rogerw0108

9 months ago

Robert and I started contributing to vLLM around the same time and today is my turn. Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members! and we're just getting started github.com/vllm-project/v…

thumb_up_off_alt51

chat_bubble_outline5

repeat4

shareShare

Simon Mo

@simon_mo_

9 months ago

Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

thumb_up_off_alt18

chat_bubble_outline2

repeat1

shareShare

vLLM

@vllm_project

7 months ago

🙏 DeepSeek's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

thumb_up_off_alt2,2K

chat_bubble_outline26

repeat350

shareShare

Simon Mo

@simon_mo_

7 months ago

😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare

OpenAI Developers

@openaidevs

7 months ago

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine vLLM ⬩OWASP Nettacker - automated network pentesting OWASP Nettacker ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines Dagster

thumb_up_off_alt947

chat_bubble_outline38

repeat182

shareShare