Joe Fioti (@joefioti) 's Twitter Profile
Joe Fioti

@joefioti

it's not possible. it's necessary. luminalai.com joefioti.com

ID: 3330038775

calendar_today16-06-2015 18:51:48

709 Tweet

357 Followers

313 Following

Matthew Gunton (@matthewjgunton) 's Twitter Profile Photo

I just published a blog on PyTorch Tensors at a low level 3 Key Learnings: 💿 Strided Tensors store data contiguously in memory, using metadata (shape + stride) to describe access 🤖 Autograd builds dynamic computation graphs to automatically compute gradients—perfect for rapid

I just published a blog on PyTorch Tensors at a low level
3 Key Learnings:
💿 Strided Tensors store data contiguously in memory, using metadata (shape + stride) to describe access
🤖 Autograd builds dynamic computation graphs to automatically compute gradients—perfect for rapid
Joe Fioti (@joefioti) 's Twitter Profile Photo

Luminal can discover flash attention entirely automatically. We've been working towards this north star in our search compiler. Check out the prototype demo below ↓

Joe Fioti (@joefioti) 's Twitter Profile Photo

Since we’ve got a lot of new people following Luminal's progress, I figure we should go over where we are and where we’re going ↓

Joe Fioti (@joefioti) 's Twitter Profile Photo

i can't square this with the other hazy post about how tensor cores make up 95% of the flops on a GPU. wouldn't it be massively beneficial? or is it because matvecs are so bandwidth constrained compared to matmuls? if someone from hazy follows me or knows someone from hazy lmk!

i can't square this with the other hazy post about how tensor cores make up 95% of the flops on a GPU. wouldn't it be massively beneficial?

or is it because matvecs are so bandwidth constrained compared to matmuls?

if someone from hazy follows me or knows someone from hazy lmk!