main (@main_horse) 's Twitter Profile
main

@main_horse

Celebrating excellence

ID: 1605840745960591360

linkhttps://blog.main.horse calendar_today22-12-2022 08:20:58

4,4K Tweet

12,12K Takipçi

777 Takip Edilen

JingyuanLiu (@jingyuanliu123) 's Twitter Profile Photo

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.
Aleksa Gordić (水平问题) (@gordic_aleksa) 's Twitter Profile Photo

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along.

(Remember matmul is the single most important operation that transformers execute
JingyuanLiu (@jingyuanliu123) 's Twitter Profile Photo

zhihu.com/question/19561… Why dpskv3.2 is exciting for both sparse attn and linear attn communities from Songlin Yang (Alert: this is in Chinese) the basic summary is: 1. after all, though swa and linear attn are popular, it is still hard to get rid of the full attn layer for

tom cunningham (@testingham) 's Twitter Profile Photo

2. GDP will be a poor proxy for AI’s impact. AI’s benefits are likely to elude GDP for two reasons: (1) it will reduce the necessity for exchange (and GDP measures exchange); (2) it will lower the labor required for services, and the value-added from services are typically