@timdarcet : Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers • TwiCopy

TimDarcet

@timdarcet

+ Follow

PhD student, building big vision models @ INRIA & FAIR (Meta)

ID: 1371396662925606913

calendar_today15-03-2021 09:44:31

982 Tweet

3,3K Takipçi

728 Takip Edilen

TimDarcet

@timdarcet

4 months ago

Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers

thumb_up_off_alt117

chat_bubble_outline4

repeat8

shareShare