Ok there's a new paper in my top 3 favorites
Vision transformers need registers
Clear problem, elegant solution, well written, easy to understand, good results, limitations included.
No fancy losses or layers. No equation (at all!)
Here's a short summary: (1/4)
EARLY PREPRINT:
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
Why do we use softmax in attention, even though we don’t really need non-zero probabilities that sum to one, causing attention sink and large hidden state activations?
Let that sink in.
Make your RAG application 10x smarter!
ColiVara is a unique document retrieval method that does not need chunking or text processing. It still feels like RAG but without OCR, text extraction, broken tables, or missing images.
What you see is what you get. ✨
Here’s why it’s a
Mathematical Theory Of Deep Learning
This book will help you to build an understanding of fundamental mathematical concepts in deep learning.
Pages-> 255