Jackson (@jacksonkekhw) 's Twitter Profile
Jackson

@jacksonkekhw

AI Engineer working on LLMs application.

ID: 1289768034614140932

linkhttps://jacksoncakes.com/ calendar_today02-08-2020 03:40:44

47 Tweet

115 Takipçi

90 Takip Edilen

zed (@zmkzmkz) 's Twitter Profile Photo

EARLY PREPRINT: Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Why do we use softmax in attention, even though we don’t really need non-zero probabilities that sum to one, causing attention sink and large hidden state activations? Let that sink in.

EARLY PREPRINT:
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Why do we use softmax in attention, even though we don’t really need non-zero probabilities that sum to one, causing attention sink and large hidden state activations?

Let that sink in.
Zeyuan Allen-Zhu, Sc.D. (@zeyuanallenzhu) 's Twitter Profile Photo

(1/8)🍎A Galileo moment for LLM design🍎 As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…

(1/8)🍎A Galileo moment for LLM design🍎
As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…