anirudh bv (@anirudhbv_ce) 's Twitter Profile
anirudh bv

@anirudhbv_ce

mle @ shopify
mlh top 50
uwaterloo ce '30

ID: 1966165862379053056

linkhttps://v0-anidev.vercel.app/ calendar_today11-09-2025 15:44:37

130 Tweet

224 Takipçi

146 Takip Edilen

anirudh bv (@anirudhbv_ce) 's Twitter Profile Photo

Day 11/30: Learning JAX for Machine Learning 🚀 Today, I learned about the full XLA pipeline from tracing to hardware codegen and Inference Optimization! The XLA compiler stack is the engine that makes JAX fast. It transforms Python jaxpr (a custom JAX-native IR) into

Day 11/30: Learning JAX for Machine Learning 🚀

Today,  I learned about the full XLA pipeline from tracing to hardware codegen and Inference Optimization!

The XLA compiler stack is the engine that makes JAX fast. It transforms Python jaxpr (a custom JAX-native IR) into
anirudh bv (@anirudhbv_ce) 's Twitter Profile Photo

Day 12/30: Learning JAX for Machine Learning 🚀 Today, I built a custom JAX Pallas TPU kernel for Matrix Multiplication, mapped out TPU compute architecture, and learned about the Mosaic compilation stack! TPUs brilliantly divide compute labor between two specialized cores;

Day 12/30: Learning JAX for Machine Learning 🚀

Today, I built a custom JAX Pallas TPU kernel for Matrix Multiplication, mapped out TPU compute architecture, and learned about the Mosaic compilation stack!

TPUs brilliantly divide compute labor between two specialized cores;
anirudh bv (@anirudhbv_ce) 's Twitter Profile Photo

Day 13/30: Learning JAX for Machine Learning 🚀 Today I revisited transformers, dove into Google's new Sequential Attention, and explored deep GPU architectures (compute, multi-instance GPUs)! Ever wondered why we divide by the square root of dk in the attention formula? I

Day 13/30: Learning JAX for Machine Learning 🚀

Today I revisited transformers, dove into <a href="/Google/">Google</a>'s new Sequential Attention, and explored deep GPU architectures (compute, multi-instance GPUs)!

Ever wondered why we divide by the square root of dk in the attention formula? I
anirudh bv (@anirudhbv_ce) 's Twitter Profile Photo

Day 22/30: Learning JAX for Machine Learning 🚀 This morning, I learned about the GPU's Warp scheduler, execution states and hiding latency with Little's Law. So I animated it. Every cycle, the warp scheduler looks at all active warps and picks one that's ready to execute. The

anirudh bv (@anirudhbv_ce) 's Twitter Profile Photo

Day 25/30: Learning JAX for Machine Learning 🚀 CUDA Cores VS Tensor Cores - what's the difference? CUDA Cores do 1 multiply per cycle. Tensor Cores do 64. Both sides are computing the same 4x4 matrix multiply: C = A x B. Left (CUDA Core): Each output element is a dot