yv | AS11414 | N6YVB (@yvbbrjdr) 's Twitter Profile
yv | AS11414 | N6YVB

@yvbbrjdr

exists as 451; RT ≠ Endorsement; Like = Thanks/Acknowledgement ≠ Agreement; Creator of @LANDropApp, @AthenaAGI; Senior System Software Engineer @NVIDIA

ID: 1603444903

linkhttps://yvb.moe/ calendar_today18-07-2013 13:15:19

3,3K Tweet

1,1K Takipçi

365 Takip Edilen

Lianmin Zheng (@lm_zheng) 's Twitter Profile Photo

Personal AI computing has just become more accessible than ever before with the new DGX Spark from NVIDIA - a tiny yet powerful Blackwell GPU in your hand. Check out the firsthand unboxing video and blog from the SGLang community members yv and Richard Chen.

LMSYS Org (@lmsysorg) 's Twitter Profile Photo

🚀 Excited to collaborate with NVIDIA and SemiAnalysis on pushing inference performance to the next level! On the Blackwell GB200 NVL72, SGLang achieved 26K input / 13K output tokens per GPU/sec. On the SemiAnalysis InferenceMAX benchmark, SGLang is the default engine for

🚀 Excited to collaborate with <a href="/nvidia/">NVIDIA</a>  and <a href="/SemiAnalysis_/">SemiAnalysis</a> on pushing inference performance to the next level!

On the Blackwell GB200 NVL72, SGLang achieved 26K input / 13K output tokens per GPU/sec. On the <a href="/SemiAnalysis_/">SemiAnalysis</a>  InferenceMAX benchmark, SGLang is the default engine for
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

Exciting updates on DGX Spark: Now you can run gpt-oss-20b at 70 tokens/s with SGLang! This is 1.4x faster than what we got in our blog last week. We worked with the NVIDIA AI Developer team to fix a bunch of Triton and quantization issues. Cannot wait to see how much performance we

Exciting updates on DGX Spark: Now you can run gpt-oss-20b at 70 tokens/s with SGLang! This is 1.4x faster than what we got in our blog last week.

We worked with the <a href="/NVIDIAAIDev/">NVIDIA AI Developer</a> team to fix a bunch of Triton and quantization issues. Cannot wait to see how much performance we
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for

SGLang now runs natively on TPU with a new pure Jax backend!

SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

During the LF InfiniEdge AI Meetup on Oct 23, we demoed running GPT-OSS on the NVIDIA DGX Spark with SGLang, hitting blazing-fast speeds: 70 tps on 20B and 50 tps on 120B ⚡️ We also showed how to use Open WebUI and even Claude Code with the local SGLang. Here’s our recap blog

During the LF InfiniEdge AI Meetup on Oct 23, we demoed running GPT-OSS on the NVIDIA DGX Spark with SGLang, hitting blazing-fast speeds: 70 tps on 20B and 50 tps on 120B ⚡️

We also showed how to use Open WebUI and even Claude Code with the local SGLang.

Here’s our recap blog