Dan | Machine Learning Engineer (@dankornas) 's Twitter Profile
Dan | Machine Learning Engineer

@dankornas

End-to-End ML Engineer. Building the best AI learning resource at ailearninghub.io. Youtube: youtube.com/@dankornas

ID: 1406639376675770374

linkhttps://forms.gle/KqvbKr54iVyNXYHu6 calendar_today20-06-2021 15:46:18

8,8K Tweet

75,75K Takipçi

491 Takip Edilen

merve (@mervenoyann) 's Twitter Profile Photo

stop building parser pipelines 👋🏻 there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! 😱 MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document 🤠

stop building parser pipelines 👋🏻

there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! 😱

MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document 🤠
Logan Thorneloe (@loganthorneloe) 's Twitter Profile Photo

The best thing junior engineers can do early in their career is learn to solve problems from the problem statement. Not being told what to build—being given a problem and deciding that for themselves. This skill is directly transferrable and foundational to building ML systems.

merve (@mervenoyann) 's Twitter Profile Photo

Dolphin: new OCR model by ByteDance with MIT license 🐬 the model first detects element in the layout (table, formula etc) and then parses each element in parallel for generation ⤵️ model and demo is on Hugging Face Hub 🤗

Dolphin: new OCR model by <a href="/BytedanceTalk/">ByteDance</a> with MIT license 🐬 

the model first detects element in the layout (table, formula etc) and then parses each element in parallel for generation ⤵️

model and demo is on <a href="/huggingface/">Hugging Face</a> Hub 🤗
ℏεsam (@hesamation) 's Twitter Profile Photo

this is the most organized structure of an ai project, it’s not just about clean code, it’s also easy to navigate for LLMs and Cursor. always separate config from code, and notebooks from src code Nina actually turned this into a repo template (in replies)

this is the most organized structure of an ai project, 

it’s not just about clean code, it’s also easy to navigate for LLMs and Cursor.

always separate config from code, and notebooks from src code

<a href="/HeyNina101/">Nina</a> actually turned this into a repo template (in replies)
Dan | Machine Learning Engineer (@dankornas) 's Twitter Profile Photo

I've been working on extract data from scanned documents and I have been having a horrible experience with PyTesseract... it only works best on perfect toy data. Luckily, I found PaddleOCR and so far, it is working perfectly. However, the main challenge now is speed.