Ying Zhang (@ipiszy) Twitter Tweets • TwiCopy

Ying Zhang

@ipiszy

+ Follow

Software Developer @ xAI, ex-Meta / Google

Past projects: FlashAttention3, AITemplate, TorchInductor

ID: 14970860

calendar_today01-06-2008 11:38:21

6 Tweet

414 Followers

180 Following

Tri Dao

@tri_dao

a year ago

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/