Ying Zhang (@ipiszy) 's Twitter Profile
Ying Zhang

@ipiszy

Software Developer @ xAI, ex-Meta / Google

Past projects: FlashAttention3, AITemplate, TorchInductor

ID: 14970860

calendar_today01-06-2008 11:38:21

6 Tweet

414 Followers

180 Following

Tri Dao (@tri_dao) 's Twitter Profile Photo

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS!
1/