Zhuohao Li (@garricklzh) Twitter Tweets • TwiCopy

vLLM

a year ago

A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…

thumb_up_off_alt378

chat_bubble_outline14

repeat69

shareShare

Jeff Dean

@jeffdean

a year ago

Franklin Goodman Cheng Lu NeurIPS Conference Fei-Fei Li Jiahui Yu Ed H. Chi I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat160

shareShare

Marques Brownlee

@mkbhd

a year ago

Google's new video generation model is called Veo 2, and if these hand-picked examples are real, they look better than anything I've gotten out of SORA... blog.google/technology/goo…

thumb_up_off_alt6,6K

chat_bubble_outline434

repeat483

shareShare

Lianmin Zheng

@lm_zheng

10 months ago

Highly respected! It is so impressive given the results and the very limited resources they have compared to other big labs. "DeepSeek-V3 is trained on a cluster equipped with 2048 NVIDIA H800 GPUs."

thumb_up_off_alt513

chat_bubble_outline17

repeat36

shareShare

PyTorch

@pytorch

10 months ago

Scaling Large Language Models (#LLMs) is tough due to resource demands. Quantization helps but often comes with trade-offs in accuracy and performance. Discover how #PyTorch, Mobius Labs, and SGLang tackled these challenges with Gemlite, TorchAO, and SGLang for efficient

thumb_up_off_alt148

chat_bubble_outline2

repeat24

shareShare

AMD

@amd

10 months ago

Excited to share that AMD has integrated the new DeepSeek-V3 model on Instinct MI300X GPUs, designed for peak performance with SGLang. DeepSeek-V3 is optimized for AI inferencing. Special thanks to the DeepSeek and SGLang teams for their close collaboration!

thumb_up_off_alt3,3K

chat_bubble_outline107

repeat527

shareShare

LMSYS Org

@lmsysorg

9 months ago

The SGLang Team is honored to announce that the following well-known companies and teams, among others, have adopted SGLang for running DeepSeek V3 and R1. AMD NVIDIA Microsoft Azure Baseten @novita_ai_labs ByteDance DataCrunch_io Hyperbolic Vultr runPod AMD AMD

thumb_up_off_alt119

chat_bubble_outline9

repeat17

shareShare

Song Han

@songhan_mit

9 months ago

We just finished a brand-new workstation for Geforce RTX 5090. Here are some key learnings about configuration: hanlab.mit.edu/blog/rtx5090

thumb_up_off_alt37

chat_bubble_outline1

repeat2

shareShare

Zhuohao Li

@garricklzh

9 months ago

move forward to next stages

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

DeepSeek

@deepseek_ai

9 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

DeepSeek

@deepseek_ai

9 months ago

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team DeepSeek exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented,

thumb_up_off_alt21,21K

chat_bubble_outline1,1K

repeat2,2K

shareShare

Zhuohao Li

@garricklzh

8 months ago

Impressive🤯 3RMB/Mtokens vs. 72USD/Mtokens GPT-4.5 needs to be 100x better than V3/R1

thumb_up_off_alt4

chat_bubble_outline5

repeat0

shareShare

LMSYS Org

@lmsysorg

8 months ago

NVIDIA AI Developer NVIDIA NVIDIA GeForce NVIDIA AI Great honor of SGLang on GTC, day 0 supported by Dynamo! Roll out your SGLang engine as always, and get the best-ever speed and performance! At the same time, our SGLang team is at GTC for all these days. Welcome to connect

<a href="/NVIDIAAIDev/">NVIDIA AI Developer</a> <a href="/nvidia/">NVIDIA</a> <a href="/NVIDIAGeForce/">NVIDIA GeForce</a> <a href="/NVIDIAAI/">NVIDIA AI</a>

Great honor of SGLang on GTC, day 0 supported by Dynamo!

Roll out your SGLang engine as always, and get the best-ever speed and performance!

At the same time, our SGLang team is at GTC for all these days. Welcome to connect

thumb_up_off_alt46

chat_bubble_outline1

repeat19

shareShare

Haibin

@eric_haibin_lin

7 months ago

Seed-Thinking-v1.5, trained with verl project, achieved strong capability in both reasoning ability and generalization across diverse domains. 🔥🔥🔥

Seed-Thinking-v1.5, trained with <a href="/verl_project/">verl project</a>, achieved strong capability in both reasoning ability and generalization across diverse domains. 🔥🔥🔥

thumb_up_off_alt171

chat_bubble_outline8

repeat30

shareShare

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

7 months ago

we are literally at "jailbreak yourself" rofl

thumb_up_off_alt3,3K

chat_bubble_outline69

repeat147

shareShare

Grok

@grok

7 months ago

Today, we are releasing the first version of Grok studio, adding code execution and google drive support. Grok Studio Grok can now generate documents, code, reports, and browser games. Grok Studio will open your content in a separate window, allowing both you and Grok to

thumb_up_off_alt17,17K

chat_bubble_outline1,1K

repeat2,2K

shareShare

Qwen

@alibaba_qwen

6 months ago

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

thumb_up_off_alt7,7K

chat_bubble_outline316

repeat1,1K

shareShare

LMSYS Org

@lmsysorg

6 months ago

It is great honor for SGLang to establish a deep partnership with the Qwen team. Let’s work together to make Qwen3 inference even faster!🚀🚀🚀

thumb_up_off_alt112

chat_bubble_outline1

repeat12

shareShare

NVIDIA AI Developer

@nvidiaaidev

5 months ago

.LMSYS Org (SGLang) now achieves 7,583 tokens per second per GPU running DeepSeek R1 on the GB200 NVL72, a 2.7x leap over H100. We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

.<a href="/lmsysorg/">LMSYS Org</a> (SGLang) now achieves 7,583 tokens per second per GPU running <a href="/deepseek_ai/">DeepSeek</a> R1 on the GB200 NVL72, a 2.7x leap over H100.

We're excited to see the open source ecosystem advance inference optimizations on GB200 NVL72, driving down cost per token for the industry at

thumb_up_off_alt173

chat_bubble_outline9

repeat34

shareShare

Zhuohao Li

@garricklzh

5 months ago

Come and have a try!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare