Manuel Faysse (@manuelfaysse) 's Twitter Profile
Manuel Faysse

@manuelfaysse

NLP (LLMs) & ML Privacy -
🥐CroissantLLM
PhD Candidate @CentraleSupelec
Prev: @imperialcollege, @epfl, @La_UPM

ID: 2220306764

linkhttps://manuelfay.github.io/ calendar_today28-11-2013 20:22:28

185 Tweet

921 Followers

273 Following

Aixin Sun 孙爱欣 (@aixinsg) 's Twitter Profile Photo

Slides for my talk on NLP by Vision Language Models. personal.ntu.edu.sg/axsun/slides/N… I started with UTF-8, which makes language storage transparent to language processing. Then, LLMs make the traditional NLP pipeline (e.g., POS tagging, parsing, NER) transparent to NLP applications.

Slides for my talk on NLP by Vision Language Models. personal.ntu.edu.sg/axsun/slides/N…  I started with UTF-8, which makes language storage transparent to language processing. Then,  LLMs make the traditional NLP pipeline (e.g., POS tagging, parsing, NER) transparent to NLP applications.
Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

It was not obvious pooling image tokens would only induce minimal performance degradation as with text tokens ! Large redundancies do exist between some patches (eg. empty white patches), but we had seen even those were useful as reasoning buffers, so very exciting results ! 🚀

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

Fun ColPali finding of the day: training a LoRA adapter on top of the "mix" version of PaliGemma, but then using this adapter with the "pt" base model version actually leads to better results (+2% DocVQA) ! Crazy how many inference time optimizations exist !

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

Enough people asked - we obliged- here's the entire ColPali training set: huggingface.co/datasets/vidor… ! We hope this can help bootstrap some ColPali finetuning efforts and we're eager to see cool work from the community !

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

Super happy that people agree with the main takeaway from our ColPali paper: the future of Document AI is doing everything in vision space - not over engineering brittle text extraction pipelines !

Benjamin Clavié (@bclavie) 's Twitter Profile Photo

RAG is increasingly going multi-modal, but document retrieval is tough, and layout gets in your way. But it shouldn't! Introducing 🪤RAGatouille's Vision-equipped, ColPali-powered sibling: 🐭Byaldi With just a few lines of code, search through documents, with no pre-processing.

RAG is increasingly going multi-modal, but document retrieval is tough, and layout gets in your way. But it shouldn't!

Introducing 🪤RAGatouille's Vision-equipped, ColPali-powered sibling: 🐭Byaldi

With just a few lines of code, search through documents, with no pre-processing.
Jo Kristian Bergum (@jobergum) 's Twitter Profile Photo

With 200M hamming distances per second per CPU core over 128d binary ColPali embeddings we are ready to tackle billion scaled PDF datasets. Harvesting the power of VLMs. Storage footprint of ColPali with binary embeddings is the same as for 7B embedding models using 4096