anirudh bv (@anirudhbv_ce) Twitter Tweets • TwiCopy

anirudh bv

@anirudhbv_ce

+ Follow

mle @ shopify
mlh top 50
uwaterloo ce '30

ID: 1966165862379053056

linkhttps://v0-anidev.vercel.app/ calendar_today11-09-2025 15:44:37

130 Tweet

224 Takipçi

146 Takip Edilen

anirudh bv

@anirudhbv_ce

2 months ago

Day 11/30: Learning JAX for Machine Learning 🚀 Today, I learned about the full XLA pipeline from tracing to hardware codegen and Inference Optimization! The XLA compiler stack is the engine that makes JAX fast. It transforms Python jaxpr (a custom JAX-native IR) into

thumb_up_off_alt88

chat_bubble_outline3

repeat6

shareShare

anirudh bv

@anirudhbv_ce

2 months ago

Day 12/30: Learning JAX for Machine Learning 🚀 Today, I built a custom JAX Pallas TPU kernel for Matrix Multiplication, mapped out TPU compute architecture, and learned about the Mosaic compilation stack! TPUs brilliantly divide compute labor between two specialized cores;

thumb_up_off_alt50

chat_bubble_outline1

repeat0

shareShare

baby keem

@babykeem

2 months ago

how do u fix openclaw internal reasoning leaking

thumb_up_off_alt9,9K

chat_bubble_outline447

repeat886

shareShare

anirudh bv

@anirudhbv_ce

2 months ago

Day 13/30: Learning JAX for Machine Learning 🚀 Today I revisited transformers, dove into Google's new Sequential Attention, and explored deep GPU architectures (compute, multi-instance GPUs)! Ever wondered why we divide by the square root of dk in the attention formula? I

Day 13/30: Learning JAX for Machine Learning 🚀

Today I revisited transformers, dove into <a href="/Google/">Google</a>'s new Sequential Attention, and explored deep GPU architectures (compute, multi-instance GPUs)!

Ever wondered why we divide by the square root of dk in the attention formula? I

thumb_up_off_alt43

chat_bubble_outline1

repeat5

shareShare

anirudh bv

@anirudhbv_ce

2 months ago

Day 22/30: Learning JAX for Machine Learning 🚀 This morning, I learned about the GPU's Warp scheduler, execution states and hiding latency with Little's Law. So I animated it. Every cycle, the warp scheduler looks at all active warps and picks one that's ready to execute. The

thumb_up_off_alt27

chat_bubble_outline4

repeat3

shareShare

anirudh bv

@anirudhbv_ce

a month ago

Day 25/30: Learning JAX for Machine Learning 🚀 CUDA Cores VS Tensor Cores - what's the difference? CUDA Cores do 1 multiply per cycle. Tensor Cores do 64. Both sides are computing the same 4x4 matrix multiply: C = A x B. Left (CUDA Core): Each output element is a dot

thumb_up_off_alt51

chat_bubble_outline0

repeat5

shareShare