Yifan Zhang (@yifan_zhang_) 's Twitter Profile
Yifan Zhang

@yifan_zhang_

Language & Thought

ID: 1585517096120655872

calendar_today27-10-2022 06:22:03

47 Tweet

284 Followers

265 Following

Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.02… We just dropped our latest research on General Preference Modeling (GPM)! 🚀

𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

Tensor Product Attention Is All You Need Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.

Tensor Product Attention Is All You Need

Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.
Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

1/ Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀 Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6

1/
Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀

Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? 

Homepage: tensorgi.github.io/T6
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Tensor Product Attention Is All You Need Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism

Tensor Product Attention Is All You Need

Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism
Quanquan Gu (@quanquangu) 's Twitter Profile Photo

MHA-->GQA-->MLA--->TPA🚀🚀🚀 Introducing Tensor Product Attention (TPA). To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has

Krishna Mohan (@kmohan2006) 's Twitter Profile Photo

1/n 'Tensor Product Attention is all you need' paper Key Points -> 1. KV size reduction by using contextual tensor decomposition for each token 2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor

1/n

'Tensor Product Attention is all you need' paper 
Key Points ->

1. KV size reduction by using contextual tensor decomposition for each token 
2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor
Quanquan Gu (@quanquangu) 's Twitter Profile Photo

Very cool! Who’d like to use FlashTPA? Drop a like if you want us to release it! MHA-->GQA-->MLA--->TPA🚀🚀 Paper: arxiv.org/pdf/2501.06425

Zhouliang Yu (@zhouliangy) 's Twitter Profile Photo

🚀 Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems. 📉 Best model performance: just 16.46% — plenty of room for progress! 🔗 Explore the project: spherelab.ai/FormalMATH/

🚀 Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems.

📉 Best model performance: just 16.46% — plenty of room for progress!

🔗 Explore the project: spherelab.ai/FormalMATH/
Oriol Vinyals (@oriolvinyalsml) 's Twitter Profile Photo

Ahead of I/O, we’re releasing an updated Gemini 2.5 Pro! It’s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! 🥇 Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!

Ahead of I/O, we’re releasing an updated Gemini 2.5 Pro! It’s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! 🥇

Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!
Gro-Tsen (@gro_tsen) 's Twitter Profile Photo

[Via jeanas.bsky.social on the non-Musky place.] And yes, this monstrosity is an actual commutative diagram from an actual math paper: “Comma 2-comonad I: Eilenberg-Moore 2-category of colax coalgebras” by Igor Baković arxiv.org/abs/2505.00682 (on page 53).