Zhuohao Li (@garricklzh) 's Twitter Profile
Zhuohao Li

@garricklzh

cooking for sglang. prev-@aws @nvidia. accelerating training/inference in all ways. @ucla, @sjtu1896

ID: 1283056394841870337

calendar_today14-07-2020 15:11:11

220 Tweet

519 Followers

446 Following

vLLM (@vllm_project) 's Twitter Profile Photo

A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…

Marques Brownlee (@mkbhd) 's Twitter Profile Photo

Google's new video generation model is called Veo 2, and if these hand-picked examples are real, they look better than anything I've gotten out of SORA... blog.google/technology/goo…

Lianmin Zheng (@lm_zheng) 's Twitter Profile Photo

Highly respected! It is so impressive given the results and the very limited resources they have compared to other big labs. "DeepSeek-V3 is trained on a cluster equipped with 2048 NVIDIA H800 GPUs."

PyTorch (@pytorch) 's Twitter Profile Photo

Scaling Large Language Models (#LLMs) is tough due to resource demands. Quantization helps but often comes with trade-offs in accuracy and performance. Discover how #PyTorch, Mobius Labs, and SGLang tackled these challenges with Gemlite, TorchAO, and SGLang for efficient

Scaling Large Language Models (#LLMs) is tough due to resource demands. Quantization helps but often comes with trade-offs in accuracy and performance.

Discover how #PyTorch, Mobius Labs, and SGLang tackled these challenges with Gemlite, TorchAO, and SGLang for efficient
AMD (@amd) 's Twitter Profile Photo

Excited to share that AMD has integrated the new DeepSeek-V3 model on Instinct MI300X GPUs, designed for peak performance with SGLang. DeepSeek-V3 is optimized for AI inferencing. Special thanks to the DeepSeek and SGLang teams for their close collaboration!

LMSYS Org (@lmsysorg) 's Twitter Profile Photo

The SGLang Team is honored to announce that the following well-known companies and teams, among others, have adopted SGLang for running DeepSeek V3 and R1. AMD NVIDIA Microsoft Azure Baseten @novita_ai_labs ByteDance DataCrunch_io Hyperbolic Vultr runPod AMD AMD

Song Han (@songhan_mit) 's Twitter Profile Photo

We just finished a brand-new workstation for Geforce RTX 5090. Here are some key learnings about configuration: hanlab.mit.edu/blog/rtx5090

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With
DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team DeepSeek exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

LMSYS Org (@lmsysorg) 's Twitter Profile Photo

NVIDIA AI Developer NVIDIA NVIDIA GeForce NVIDIA AI Great honor of SGLang on GTC, day 0 supported by Dynamo! Roll out your SGLang engine as always, and get the best-ever speed and performance! At the same time, our SGLang team is at GTC for all these days. Welcome to connect

<a href="/NVIDIAAIDev/">NVIDIA AI Developer</a> <a href="/nvidia/">NVIDIA</a> <a href="/NVIDIAGeForce/">NVIDIA GeForce</a> <a href="/NVIDIAAI/">NVIDIA AI</a> 

Great honor of SGLang on GTC, day 0 supported by Dynamo!

Roll out your SGLang engine as always, and get the best-ever speed and performance!

At the same time, our SGLang team is at GTC for all these days. Welcome to connect
Haibin (@eric_haibin_lin) 's Twitter Profile Photo

Seed-Thinking-v1.5, trained with verl project, achieved strong capability in both reasoning ability and generalization across diverse domains. 🔥🔥🔥

Seed-Thinking-v1.5, trained with <a href="/verl_project/">verl project</a>, achieved strong capability in both reasoning ability and generalization across diverse domains. 🔥🔥🔥
Grok (@grok) 's Twitter Profile Photo

Today, we are releasing the first version of Grok studio, adding code execution and google drive support. Grok Studio Grok can now generate documents, code, reports, and browser games. Grok Studio will open your content in a separate window, allowing both you and Grok to

Qwen (@alibaba_qwen) 's Twitter Profile Photo

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

It is great honor for SGLang to establish a deep partnership with the Qwen team. Let’s work together to make Qwen3 inference even faster!🚀🚀🚀

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at