Yifan Zhang (@yifan_zhang_) Twitter Tweets • TwiCopy

Yifan Zhang

a year ago

1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.02… We just dropped our latest research on General Preference Modeling (GPM)! 🚀

thumb_up_off_alt47

chat_bubble_outline4

repeat16

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 months ago

Tensor Product Attention Is All You Need Tensor Product Attention reduces memory overhead by compressing KV cache using tensor decompositions. The T6 Transformer, built on TPA, processes longer sequences efficiently and outperforms standard models across benchmarks.

thumb_up_off_alt170

chat_bubble_outline3

repeat35

shareShare

Yifan Zhang

@yifan_zhang_

10 months ago

1/ Introducing “Tensor Product Attention Is All You Need” (TPA) and Tensor ProducT ATTenTion Transformer (T6)! 🚀 Ever wondered if there’s a more memory-efficient way to handle long contexts in LLMs? Homepage: tensorgi.github.io/T6

thumb_up_off_alt316

chat_bubble_outline6

repeat63

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

10 months ago

Tensor Product Attention Is All You Need Proposes Tensor Product Attention (TPA), a mechanism that factorizes Q, K, and V activations using contextual tensor decompositions to achieve 10x or more reduction in inference-time KV cache size relative to standard attention mechanism

thumb_up_off_alt438

chat_bubble_outline16

repeat81

shareShare

Quanquan Gu

@quanquangu

10 months ago

We're the architects now. 🏗️📐.

thumb_up_off_alt110

chat_bubble_outline8

repeat10

shareShare

Quanquan Gu

@quanquangu

10 months ago

MHA-->GQA-->MLA--->TPA🚀🚀🚀 Introducing Tensor Product Attention (TPA). To reduce KV cache size, various Multi-Head Attention (MHA) variants have been developed, including Multi-Query Attention (MQA), Group Query Attention (GQA), and Multi-Head Latent Attention (MLA). GQA has

thumb_up_off_alt326

chat_bubble_outline12

repeat57

shareShare

Thomas Ahle

@thomasahle

10 months ago

Tensor Product Attention illustrated with Tensor Diagrams.

thumb_up_off_alt221

chat_bubble_outline2

repeat27

shareShare

Krishna Mohan

@kmohan2006

10 months ago

1/n 'Tensor Product Attention is all you need' paper Key Points -> 1. KV size reduction by using contextual tensor decomposition for each token 2. Dividing hidden_dimension for each token into head dimension factor and token dimension factor and then combining using tensor

thumb_up_off_alt108

chat_bubble_outline3

repeat14

shareShare

Quanquan Gu

@quanquangu

9 months ago

Very cool! Who’d like to use FlashTPA? Drop a like if you want us to release it! MHA-->GQA-->MLA--->TPA🚀🚀 Paper: arxiv.org/pdf/2501.06425

thumb_up_off_alt106

chat_bubble_outline4

repeat13

shareShare

Zhouliang Yu

@zhouliangy

7 months ago

🚀 Excited to introduce FormalMATH: a large-scale formal math benchmark with 5,560 formally verified Lean 4 statements from Olympiad and UG-level problems. 📉 Best model performance: just 16.46% — plenty of room for progress! 🔗 Explore the project: spherelab.ai/FormalMATH/

thumb_up_off_alt23

chat_bubble_outline1

repeat6

shareShare

Oriol Vinyals

@oriolvinyalsml

7 months ago

Ahead of I/O, we’re releasing an updated Gemini 2.5 Pro! It’s now #1 on WebDevArena leaderboard, breaking the 1400 ELO barrier! 🥇 Our most advanced coding model yet, with stronger performance on code transformation & editing. Excited to build drastic agents on top of this!

thumb_up_off_alt761

chat_bubble_outline35

repeat64

shareShare

Gro-Tsen

@gro_tsen

6 months ago

[Via jeanas.bsky.social on the non-Musky place.] And yes, this monstrosity is an actual commutative diagram from an actual math paper: “Comma 2-comonad I: Eilenberg-Moore 2-category of colax coalgebras” by Igor Baković arxiv.org/abs/2505.00682 (on page 53).

thumb_up_off_alt34

chat_bubble_outline2

repeat4

shareShare