Daniel Ching (@danielchingwq) 's Twitter Profile
Daniel Ching

@danielchingwq

🇸🇬 | grokking @menloresearch | 💻, 🏃‍♂️, 🫀

ID: 1408309359927431168

linkhttps://danielcwq.com calendar_today25-06-2021 06:22:00

964 Tweet

436 Followers

646 Following

justin wu (@byjustinwu) 's Twitter Profile Photo

4th day in the loo 👍 ✅attended kw redeemer church;; their pastor is very funny and cool ✅a 3rd yr UW christian paid for ALL 12 first years’ lunch… insane. ✅chat w my don omar; he’s pre chill and shouted me out in his email LMAO

4th day in the loo 👍

✅attended kw redeemer church;; their pastor is very funny and cool
✅a 3rd yr UW christian paid for ALL 12 first years’ lunch… insane.
✅chat w my don omar; he’s pre chill and shouted me out in his email LMAO
Pradyumna (@pradyuprasad) 's Twitter Profile Photo

This essay is correct in many ways. The question then is, now what? I think it is upon us to change it. To make the golden age of our glorious yet nascent city-state

Aditya Makkar (@adityamakkar000) 's Twitter Profile Photo

Wrote an in-depth blog on Scaling Modern Transformers with n-D parallelism. Includes an explanation with code on how modern-day transformers (multi-latent attention, decoupled RoPE, interleaved-attention, mixture-of-experts, etc.) are built across a multi-node TPU cluster.

Wrote an in-depth blog on Scaling Modern Transformers with n-D parallelism.

Includes an explanation with code on how modern-day transformers (multi-latent attention, decoupled RoPE, interleaved-attention, mixture-of-experts, etc.) are built across a multi-node TPU cluster.
Chinmay Jindal (@chinmayjindal_) 's Twitter Profile Photo

ever wondered how billion-param transformers are actually trained across massive TPU clusters? i just built one from scratch and wrote a step-by-step guide (with code). introducing JAXformer: a modern transformer with n-D parallelism written entirely in JAX. here's how 👇

ever wondered how billion-param transformers are actually trained across massive TPU clusters?

i just built one from scratch and wrote a step-by-step guide (with code). introducing JAXformer: a modern transformer with n-D parallelism written entirely in JAX. 

here's how 👇
Kevin Thomas (@kevinjosethomas) 's Twitter Profile Photo

earlier this summer, i headed to Toronto and spent a month continuing my work at UofT's Computational Social Science Lab found a little time to write some thoughts ↓

earlier this summer, i headed to Toronto and spent a month continuing my work at UofT's Computational Social Science Lab

found a little time to write some thoughts ↓
Thien Tran (@gaunernst) 's Twitter Profile Photo

New blogpost: Use NVRTC to explore MMA instruction variants, including FP16 accumulate and INT4. gau-nernst.github.io/nvrtc-matmul/ Thanks Mark Saroufim for bringing NVRTC to PyTorch, and mobicham for (indirectly) giving me the idea.