main (@main_horse) Twitter Tweets • TwiCopy

main

@main_horse

+ Follow

Celebrating excellence

ID: 1605840745960591360

linkhttps://blog.main.horse calendar_today22-12-2022 08:20:58

4,4K Tweet

12,12K Takipçi

777 Takip Edilen

JingyuanLiu

@jingyuanliu123

4 months ago

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes

thumb_up_off_alt3,3K

chat_bubble_outline59

repeat343

shareShare

SzymonOzog

@szymonozog_

3 months ago

Blog: szymonozog.github.io/posts/2025-09-… Repo: github.com/SzymonOzog/Pen…

thumb_up_off_alt46

chat_bubble_outline0

repeat6

shareShare

exns

@euxenus

3 months ago

checkout the full post for more details deklan.dev/hnet-router

thumb_up_off_alt19

chat_bubble_outline1

repeat2

shareShare

Benjamin F Spector

@bfspector

3 months ago

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.

thumb_up_off_alt321

chat_bubble_outline7

repeat48

shareShare

Aleksa Gordić (水平问题)

@gordic_aleksa

3 months ago

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat390

shareShare

JingyuanLiu

@jingyuanliu123

3 months ago

zhihu.com/question/19561… Why dpskv3.2 is exciting for both sparse attn and linear attn communities from Songlin Yang (Alert: this is in Chinese) the basic summary is: 1. after all, though swa and linear attn are popular, it is still hard to get rid of the full attn layer for

thumb_up_off_alt128

chat_bubble_outline3

repeat16

shareShare

tom cunningham

@testingham

3 months ago

2. GDP will be a poor proxy for AI’s impact. AI’s benefits are likely to elude GDP for two reasons: (1) it will reduce the necessity for exchange (and GDP measures exchange); (2) it will lower the labor required for services, and the value-added from services are typically

thumb_up_off_alt76

chat_bubble_outline2

repeat8

shareShare