Seba (@culstory) 's Twitter Profile
Seba

@culstory

TLDR

ID: 343658207

calendar_today27-07-2011 22:06:20

214 Tweet

261 Takipçi

2,2K Takip Edilen

Seba (@culstory) 's Twitter Profile Photo

been playing with 2 amd mi50, okay performance with sglang. qwen3-8B ~60t/s gen with tp2 small ctx, (~900GB/s reached bw), 50t/s ctx 4k qwen3-4b single gpu gets ~80t/s (~600GB/s bw), individual bws: -up+g matmul kernels reach ~850GB/s -down mm 800GB/s -qkv 870GB/s -o 800GB/s

Seba (@culstory) 's Twitter Profile Photo

interesting pair of papers yday bytedance chads with ultramemv2 arxiv.org/abs/2508.18756 Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks arxiv.org/abs/2508.18672

Seba (@culstory) 's Twitter Profile Photo

with language diffusion models research slowly catching, my biggest hunch is that param heavy encoder-small weights, heavy flops decoder would greatly fit current consumer hw. waiting eerily for what neurips.cc/virtual/2025/p… has to show us