Yifan Zhang (@yifan_zhang_) 's Twitter Profile
Yifan Zhang

@yifan_zhang_

Language & Thought

ID: 1585517096120655872

calendar_today27-10-2022 06:22:03

47 Tweet

284 Followers

265 Following

Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

1/8 โญGeneral Preference Modeling with Preference Representations for Aligning Language Modelsโญ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.02โ€ฆ We just dropped our latest research on General Preference Modeling (GPM)! ๐Ÿš€

๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ (@gm8xx8) 's Twitter Profile Photo

Tensor Product Attention Is All You Need Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.

Tensor Product Attention Is All You Need

Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.
Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

1/ Introducing โ€œTensor Product Attention Is All You Needโ€ (TPA) and Tensor ProducT ATTenTion Transformer (T6)! ๐Ÿš€ Ever wondered if thereโ€™s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6

1/
Introducing โ€œTensor Product Attention Is All You Needโ€ (TPA) and Tensor ProducT ATTenTion Transformer (T6)! ๐Ÿš€

Ever wondered if thereโ€™s a more memory-efficient way to handle long contexts in LLMs? 

Homepage: tensorgi.github.io/T6
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Tensor Product Attention Is All You Need Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism

Tensor Product Attention Is All You Need

Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism
Quanquan Gu (@quanquangu) 's Twitter Profile Photo

MHA-->GQA-->MLA--->TPA๐Ÿš€๐Ÿš€๐Ÿš€ Introducing Tensor Product Attention (TPA). To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has

Krishna Mohan (@kmohan2006) 's Twitter Profile Photo

1/n 'Tensor Product Attention is all you need' paper Key Points -> 1. KV size reduction by using contextual tensor decomposition for each token 2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor

1/n

'Tensor Product Attention is all you need' paper 
Key Points ->

1. KV size reduction by using contextual tensor decomposition for each token 
2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor
Quanquan Gu (@quanquangu) 's Twitter Profile Photo

Very cool! Whoโ€™d like to use FlashTPA? Drop a like if you want us to release it! MHA-->GQA-->MLA--->TPA๐Ÿš€๐Ÿš€ Paper: arxiv.org/pdf/2501.06425

Zhouliang Yu (@zhouliangy) 's Twitter Profile Photo

๐Ÿš€ Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems. ๐Ÿ“‰ Best model performance: just 16.46% โ€” plenty of room for progress! ๐Ÿ”— Explore the project: spherelab.ai/FormalMATH/

๐Ÿš€ Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems.

๐Ÿ“‰ Best model performance: just 16.46% โ€” plenty of room for progress!

๐Ÿ”— Explore the project: spherelab.ai/FormalMATH/
Oriol Vinyals (@oriolvinyalsml) 's Twitter Profile Photo

Ahead of I/O, weโ€™re releasing an updated Gemini 2.5 Pro! Itโ€™s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! ๐Ÿฅ‡ Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!

Ahead of I/O, weโ€™re releasing an updated Gemini 2.5 Pro! Itโ€™s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! ๐Ÿฅ‡

Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!
Gro-Tsen (@gro_tsen) 's Twitter Profile Photo

[Via jeanas.bsky.social on the non-Musky place.] And yes, this monstrosity is an actual commutative diagram from an actual math paper: โ€œComma 2-comonad I: Eilenberg-Moore 2-category of colax coalgebrasโ€ by Igor Bakoviฤ‡ arxiv.org/abs/2505.00682 (on page 53).