Ali Hassani (@alihassanijr) Twitter Tweets • TwiCopy

Ali Hassani

@alihassanijr

+ Follow

Computer Science PhD Student at Georgia Tech.
The revolution will be performance-optimized.

ID: 928595497748541441

linkhttps://alihassanijr.com calendar_today09-11-2017 12:09:53

123 Tweet

235 Followers

44 Following

HippoML

@hippoml_com

a year ago

Sharing our efforts in optimizing Attention on Apple Silicon, Up to 80X speed up. #applesilicon #transformers #ai blog.hippoml.com/up-to-80x-spee…

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Introducing PrivateCanvas. Harness the power of your local GPU for contiguous editing and generating with cutting-edge models like Large Language Model, SDXL, Segment Anything, and GANs. Experience top-tier performance with minimal hardware demands. blog.hippoml.com/super-ai-creat…

thumb_up_off_alt77

chat_bubble_outline2

repeat20

shareShare

Ali Hassani

@alihassanijr

a year ago

Just pushed out a new NATTEN release in over 10 months. It includes our new GEMM kernels for SM70 and above, Forward-mode AD support, support for nested tensors (inference only), 3D NA (naive kernels only), BF16 support for compatible devices, and more. shi-labs.com/natten/

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

HippoML

@hippoml_com

a year ago

FP8 HippoAttention is coming: Up to 3X faster than FlashAttentionV2 on H100. More than 700T achieved FLOPS on MI-300X. #Transformers #GPU blog.hippoml.com/8bit-hippoatte…

thumb_up_off_alt145

chat_bubble_outline1

repeat33

shareShare

HippoML

@hippoml_com

a year ago

Surpassing the 1 PFLOPS Attention Milestone. blog.hippoml.com/petaflops-infe…

thumb_up_off_alt3

chat_bubble_outline0

repeat3

shareShare

Vijay

@drop_all_tables

6 months ago

When FA2 was released, I wrote this tweet: x.com/DROP_ALL_TABLE… Today, I feel compelled to say the same thing again. It is so so cool to see DL community embrace and use CUTLASS and CuTe to develop novel algorithms that make Tensor Cores sing <3

thumb_up_off_alt19

chat_bubble_outline2

repeat4

shareShare

Horace He

@chhillee

5 months ago

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10