Raphaël Sourty (@raphaelsrty) 's Twitter Profile
Raphaël Sourty

@raphaelsrty

Language Models, Knowledge Bases, Knowledge Distillation PhD | AI @LightonIO

ID: 1262679654239961088

linkhttps://github.com/raphaelsty calendar_today19-05-2020 09:41:23

313 Tweet

395 Followers

633 Following

Antoine Chaffin (@antoine_chaffin) 's Twitter Profile Photo

LightOn joins the OCR mania We release a 1B model achieving SOTA results while being much faster than all the recent releases It is also an end-to-end trainable solution for easy adaptation to your specific domains We also share interesting insights (and soon the dataset!)

Iacopo Poli (@iacopo_poli) 's Twitter Profile Photo

The recipe for a fast, performant OCR model: 1. tell Said that OCR is solved 2. let him rage about the state of OCR 3. get a few smart people in a GMeet with him 4. tell them there are GPUs available 5. wait a bit 6. enjoy🦉 Soon deployed in your favorite Enterprise environments

Oskar Hallström (@oskar_hallstrom) 's Twitter Profile Photo

Last few days have been insane in the OCR land with releases from DeepSeek, PaddlePaddle and others. Now we at LightOn are entering the game with our latest release, pushing the state of the art even further. Kudos staghado Baptiste Aubertin Adrien Cavaillès 🥳

Daniel van Strien (@vanstriendaniel) 's Twitter Profile Photo

Vibe checks of the new LightOn OCR model on some 20th Century digitised books (from National Library of Scotland). This collection has existing OCR (turns out OCR was a thing before VLMs 😜) Vs the old OCR: Paragraph structure better maintained

Vibe checks of the new <a href="/LightOnIO/">LightOn</a> OCR model on some 20th Century digitised books (from <a href="/natlibscot/">National Library of Scotland</a>). 

This collection has existing OCR (turns out OCR was a thing before VLMs 😜)

Vs the old OCR:

Paragraph structure better maintained
Raphaël Sourty (@raphaelsrty) 's Twitter Profile Photo

Talk from my LightOn buddy Amélie Chatelain next week with Weights & Biases in London. Fresh perspective of the cost of LLM at scale when feeding decent amount of context in their prompt I'll don't spoil the moral of the story but retrieval might be helpful there

staghado (@staghado) 's Twitter Profile Photo

A Halloween gift 🎃 New finetuning notebook for LightOnOCR-1B: • Supports both Full and LoRA training • Supports FineVision 🤗 subsets incl. OlmOCR-mix & handwritten IAM • ~12 min/epoch on one H100 (can also runs on Colab!)

A Halloween gift 🎃
 New finetuning notebook for LightOnOCR-1B:
 • Supports both Full and LoRA training
 • Supports FineVision 🤗 subsets incl. OlmOCR-mix &amp; handwritten IAM
 • ~12 min/epoch on one H100 (can also runs on Colab!)