Yuliang-Liu (@yl07342021) Twitter Tweets • TwiCopy

Yuliang-Liu

@yl07342021

+ Follow

Faculty in Huazhong University of Science and Technology. Focusing on Document Intelligence.

ID: 1243664013990301696

linkhttps://github.com/Yuliang-Liu calendar_today27-03-2020 22:19:49

18 Tweet

18 Followers

11 Following

Rohan Paul

@rohanpaul_ai

a year ago

Document parsing struggles with error accumulation in pipelines and slowness in large end-to-end models. MonkeyOCR solves this with a Structure-Recognition-Relation paradigm, balancing speed and accuracy. Methods 🔧: → It first detects semantic regions using a YOLO-based

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

a year ago

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm - +5.1% avg gain over MinerU across 9 doc types (+15.0% formulas, +8.6% tables) - Beats Gemini 2.5 Pro & Qwen2.5-VL-72B on English docs with just 3B params - Faster parsing: 0.84 pages/sec (vs

thumb_up_off_alt28

chat_bubble_outline1

repeat12

shareShare

AIGCLINK

@aigclink

a year ago

最新出的一款轻量级基于LLM的文档解析模型：MonkeyOCR，性能好速度快其3B在英文文档解析任务上平均性能超过了Gemini 2.5 Pro和Qwen2.5-VL-72B 对多页文档解析，它的处理速度达到每秒0.84页，超过了MinerU的0.65页/秒和Qwen2.5-VL-7B的0.12页/秒

thumb_up_off_alt299

chat_bubble_outline5

repeat54

shareShare

出家如初

@chuanliang

a year ago

每日佳软推荐2025-06-10 1、AI类佳软 MonkeyOCR github.com/Yuliang-Liu/Mo… 基于 LMM 的轻量级文档解析模型，项目自称效果由于 MinerU 、Gemini 2.5 Pro 和 Qwen2.5 VL-72B Veo 3 Directory veo3.directory Google Veo 视频生成收集站 daily-arXiv-ai-enhanced

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Yuliang-Liu

@yl07342021

10 months ago

We’ve just released MonkeyOCR-pro-3B, along with a leaner and faster variant — MonkeyOCR-pro-1.2B! MonkeyOCR-pro-3B outperforms Gemini 2.0-Flash, Gemini 2.5-Pro, Qwen2.5-VL-72B, GPT-4o, and InternVL3-78B on OmniDocBench. Check it out: github.com/Yuliang-Liu/Mo…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Yuliang-Liu

@yl07342021

6 months ago

🚀 Introducing MonkeyOCR v1.5 — our new multimodal model with general document parsing capabilities that significantly outperform paddleOCR and DeepSeek OCR. 🎉 A major milestone: complex table parsing accuracy breaks 90% for the first time! 🔗 Paper: arxiv.org/abs/2511.10390…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare