yv | AS11414 | N6YVB (@yvbbrjdr) Twitter Tweets • TwiCopy

yv | AS11414 | N6YVB

@yvbbrjdr

+ Follow

exists as 451; RT ≠ Endorsement; Like = Thanks/Acknowledgement ≠ Agreement; Creator of @LANDropApp, @AthenaAGI; Senior System Software Engineer @NVIDIA

ID: 1603444903

linkhttps://yvb.moe/ calendar_today18-07-2013 13:15:19

3,3K Tweet

1,1K Takipçi

365 Takip Edilen

yv | AS11414 | N6YVB

@yvbbrjdr

3 months ago

这可能是我第一次在网上出镜（

thumb_up_off_alt26

chat_bubble_outline5

repeat0

shareShare

Personal AI computing has just become more accessible than ever before with the new DGX Spark from NVIDIA - a tiny yet powerful Blackwell GPU in your hand. Check out the firsthand unboxing video and blog from the SGLang community members yv and Richard Chen.

thumb_up_off_alt173

chat_bubble_outline8

repeat12

shareShare

Ying Sheng

@ying11231

3 months ago

Congrats on NVIDIA DGX Spark⚡️We have a YouTube video this time given by yv 🤩

thumb_up_off_alt57

chat_bubble_outline1

repeat4

shareShare

LMSYS Org

@lmsysorg

3 months ago

🚀 Excited to collaborate with NVIDIA and SemiAnalysis on pushing inference performance to the next level! On the Blackwell GB200 NVL72, SGLang achieved 26K input / 13K output tokens per GPU/sec. On the SemiAnalysis InferenceMAX benchmark, SGLang is the default engine for

🚀 Excited to collaborate with <a href="/nvidia/">NVIDIA</a> and <a href="/SemiAnalysis_/">SemiAnalysis</a> on pushing inference performance to the next level!

On the Blackwell GB200 NVL72, SGLang achieved 26K input / 13K output tokens per GPU/sec. On the <a href="/SemiAnalysis_/">SemiAnalysis</a> InferenceMAX benchmark, SGLang is the default engine for

thumb_up_off_alt62

chat_bubble_outline5

repeat9

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

3 months ago

Andrej越来越会整活了，但他也越来越独立开发者了。。

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

3 months ago

他终于g了

thumb_up_off_alt16

chat_bubble_outline2

repeat1

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

2 months ago

这也没错其实，大部分业务逻辑就是map和reduce，再加个filter，也就能规约成for和if

thumb_up_off_alt30

chat_bubble_outline0

repeat0

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

2 months ago

那这张图可以扩展一下：VSCode, Cursor, Windsurf, ChatGPT.app, Claude.app 都是Chromium

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

LMSYS Org

@lmsysorg

2 months ago

Exciting updates on DGX Spark: Now you can run gpt-oss-20b at 70 tokens/s with SGLang! This is 1.4x faster than what we got in our blog last week. We worked with the NVIDIA AI Developer team to fix a bunch of Triton and quantization issues. Cannot wait to see how much performance we

thumb_up_off_alt148

chat_bubble_outline11

repeat19

shareShare

Lianmin Zheng

@lm_zheng

2 months ago

1.4x speedup after one week of release!

thumb_up_off_alt148

chat_bubble_outline4

repeat8

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

2 months ago

我两年前的这条推含金量还在上升

thumb_up_off_alt17

chat_bubble_outline1

repeat2

shareShare

LMSYS Org

@lmsysorg

2 months ago

SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for

thumb_up_off_alt95

chat_bubble_outline4

repeat19

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

2 months ago

float64 is all you need

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

yv | AS11414 | N6YVB

@yvbbrjdr

2 months ago

Thought experiment: Model the entirety of a brain (in a computer) and mock everything else. Do you get consciousness?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

LMSYS Org

@lmsysorg

2 months ago

During the LF InfiniEdge AI Meetup on Oct 23, we demoed running GPT-OSS on the NVIDIA DGX Spark with SGLang, hitting blazing-fast speeds: 70 tps on 20B and 50 tps on 120B ⚡️ We also showed how to use Open WebUI and even Claude Code with the local SGLang. Here’s our recap blog