Ali Hassani (@alihassanijr) 's Twitter Profile
Ali Hassani

@alihassanijr

Computer Science PhD Student at Georgia Tech.
The revolution will be performance-optimized.

ID: 928595497748541441

linkhttps://alihassanijr.com calendar_today09-11-2017 12:09:53

123 Tweet

235 Followers

44 Following

HippoML (@hippoml_com) 's Twitter Profile Photo

Sharing our efforts in optimizing Attention on Apple Silicon, Up to 80X speed up. #applesilicon #transformers #ai blog.hippoml.com/up-to-80x-spee…

HippoML (@hippoml_com) 's Twitter Profile Photo

Introducing PrivateCanvas. Harness the power of your local GPU for contiguous editing and generating with cutting-edge models like Large Language Model, SDXL, Segment Anything, and GANs. Experience top-tier performance with minimal hardware demands. blog.hippoml.com/super-ai-creat…

Ali Hassani (@alihassanijr) 's Twitter Profile Photo

Just pushed out a new NATTEN release in over 10 months. It includes our new GEMM kernels for SM70 and above, Forward-mode AD support, support for nested tensors (inference only), 3D NA (naive kernels only), BF16 support for compatible devices, and more. shi-labs.com/natten/

HippoML (@hippoml_com) 's Twitter Profile Photo

FP8 HippoAttention is coming: Up to 3X faster than FlashAttentionV2 on H100. More than 700T achieved FLOPS on MI-300X. #Transformers #GPU blog.hippoml.com/8bit-hippoatte…

Vijay (@drop_all_tables) 's Twitter Profile Photo

When FA2 was released, I wrote this tweet: x.com/DROP_ALL_TABLE… Today, I feel compelled to say the same thing again. It is so so cool to see DL community embrace and use CUTLASS and CuTe to develop novel algorithms that make Tensor Cores sing <3

Horace He (@chhillee) 's Twitter Profile Photo

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10

For too long, users have lived under the software lottery tyranny of fused attention implementations. 

No longer. 

Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch.
pytorch.org/blog/flexatten…
1/10