1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197
As Huggingface Daily Papers: huggingface.co/papers/2410.02…
We just dropped our latest research on General Preference Modeling (GPM)! 🚀
Tensor Product Attention Is All You Need
Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.
1/
Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀
Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs?
Homepage: tensorgi.github.io/T6
Tensor Product Attention Is All You Need
Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism
MHA-->GQA-->MLA--->TPA🚀🚀🚀
Introducing Tensor Product Attention (TPA).
To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has
1/n
'Tensor Product Attention is all you need' paper
Key Points ->
1. KV size reduction by using contextual tensor decomposition for each token
2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor
🚀 Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems.
📉 Best model performance: just 16.46% — plenty of room for progress!
🔗 Explore the project: spherelab.ai/FormalMATH/
Ahead of I/O, we’re releasing an updated Gemini 2.5 Pro! It’s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! 🥇
Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!
[Via jeanas.bsky.social on the non-Musky place.]
And yes, this monstrosity is an actual commutative diagram from an actual math paper: “Comma 2-comonad I: Eilenberg-Moore 2-category of colax coalgebras” by Igor Baković arxiv.org/abs/2505.00682 (on page 53).