Horace He (@chhillee) 's Twitter Profile
Horace He

@chhillee

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

ID: 117233133

linkhttps://www.thonking.ai/p/strangely-matrix-multiplications calendar_today24-02-2010 23:48:25

3,3K Tweet

34,34K Takipçi

524 Takip Edilen

Horace He (@chhillee) 's Twitter Profile Photo

Guys I’ve been doing some reading and I’ve discovered something that might crash the *entire* tech industry(!) Apparently Taiwan can manufacture transistors far cheaper than American companies, and they’re going to drop the price >50x over the next 10 years. Short everything.

François Fleuret (@francoisfleuret) 's Twitter Profile Photo

It is hard to overstate how cool and powerful is flex attention. Horace He pytorch.org/blog/flexatten… TL;DR: it is an implementation of the attention operator in @pytorch that allows in particular to efficiently "carve" the attention matrix. 1/3

Horace He (@chhillee) 's Twitter Profile Photo

Most normal FlexAttention mask. Also, thanks for the "Implementation-wise, although FlexAttention practically enabled the project..." comment - that's perhaps the #1 thing we were hoping for with FlexAttention :)

Horace He (@chhillee) 's Twitter Profile Photo

I'll be here and talking about ML systems! There'll be some of the best GPU folk I know here, so come and learn more together about Blackwell GPUs!

Horace He (@chhillee) 's Twitter Profile Photo

This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! github.com/deepspeedai/De…

Horace He (@chhillee) 's Twitter Profile Photo

When this word started popping up I initially smugly thought that people were misspelling "syncophant" only to realize that I'd entangled "sycophant" with "syncopation" in my head.

Horace He (@chhillee) 's Twitter Profile Photo

The fundamental question here (computing MFU) is a very reasonable question to ask in an interview (and if I'd recommend learning it if you don't know how). However, the real interview question I would like to ask is this: "I see 3 assumptions in this question that range from