
Vijay
@__tensorcore__
MLIR, CUTLASS,Tensor Core arch @NVIDIA. Mechanic @hpcgarage. Exercise of any 1st amendment rights are for none other than myself.
ID: 3280272739
https://thakkarv.dev 15-07-2015 03:34:51
1,1K Tweet
1,1K Takipçi
493 Takip Edilen





timelapse #58 (14.5 hrs): - used cutlass python DSL to increase elementwise add/mul memory throughput (from pytorch 500GB/s to cutlass 850GB/s) - diving into cutlass 4.0 (minus tile abstractions) - cuda book design decisions with maharshi (महर्षि) - restructure of 5 chapters -





Another 🔥 blog about CUTLASS from Colfax International, this time focusing on the gory details of block-scaled MXFP and NVFP data types and Blackwell kernels for them. research.colfax-intl.com/cutlass-tutori…