OpenDataLab (@opendatalab_ai) 's Twitter Profile
OpenDataLab

@opendatalab_ai

ID: 1623876801092255747

calendar_today10-02-2023 02:49:46

30 Tweet

145 Takipçi

42 Takip Edilen

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

🎉Upgrade Notice MinerU is a LLM-powered tool that converts PDFs into machine-readable formats. 0.7.1 is now available, which add new integration option of the paddle tablemaster table recognition model, enhancing table processing speed. 🚀 github.com/opendatalab/Mi…

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

We present DocLayout-YOLO, which is suitable for diverse document layout detection, including but not limited to papers, textbooks, test papers, slides and other document types. ✨github:github.com/opendatalab/Do… 📜paper:arxiv.org/abs/2410.12628 💻demo:huggingface.co/spaces/opendat…

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

📢MinerU New Year Update – January 2025 Highlights There are the new features: Brand Visual Revamp: A complete redesign of the MinerU brand, along with the official relaunch of our website, offering easy access to technical documentation. Visit us at: mineru.net

📢MinerU New Year Update – January 2025 Highlights  There are the new features:  
Brand Visual Revamp: A complete redesign of the MinerU brand, along with the official relaunch of our website, offering easy access to technical documentation.  Visit us at: mineru.net
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

● Official Client Release: Download and use with no programming required. Simply drag and drop to quickly process multiple documents for extraction without the need for login.

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

● Online API Services & Demo: Aligned with the latest model capabilities, optimized resource scheduling strategies, and enhanced batch processing capabilities.

● Online API Services & Demo: Aligned with the latest model capabilities, optimized resource scheduling strategies, and enhanced batch processing capabilities.
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

‼️ Important Notice 📷: The v2 and v3 version APIs are now discontinued. Please migrate to the new v4 API, available under the new domain, and create a new token for continued use.

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

"Wanjuan2.0" is a multilingual and multimodal #corpus that comprises four #data modalities: full text, image- text, video, and #SFT, totaling 11.5 million data entries, covering Russian, Arabic, Korean, Vietnamese, Thai, etc. Open-sourse link: opendatalab.com/applyMultiling…

"Wanjuan2.0" is a multilingual and multimodal #corpus that comprises four #data modalities: full text, image- text, video, and #SFT, totaling 11.5 million data entries, covering Russian, Arabic, Korean, Vietnamese, Thai, etc.
Open-sourse link: opendatalab.com/applyMultiling…
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

The multilingual and multimodal #corpus "Wanjuan2.0" was open-sourced on HuggingFace, with ultra-fine #data , and applicable to multiple scenarios, such as cultural tourism, commercial trade, science and technology education. FREE DOWNLOAD FROM: huggingface.co/datasets?sort=…

The multilingual and multimodal #corpus "Wanjuan2.0" was open-sourced on HuggingFace, with ultra-fine #data , and applicable to multiple scenarios, such as cultural tourism, commercial trade, science and technology education.  FREE DOWNLOAD FROM: huggingface.co/datasets?sort=…
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

What is your ideal data processing #tool? Get #MinerU as a professional assistant to help you get #AI-READY #data . Find out the core function of MinerU as your wish!

What is your ideal data processing #tool? Get #MinerU as a professional assistant to help you get #AI-READY #data . Find out the core function of MinerU as your wish!
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

Are you looking for a tool to help you labeling #data? You can try #LabelU, the flexible labeling tool, which is applicable to #CV, voice interaction and #AI-assisted labeling. 👉labelu.shlab.tech/tasks/

OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

Document content analysis has been a crucial research area in computer vision. We present #MinerU, an open-source solution for high-precision document content extraction. Deep dive into MinerU via the technical report: mineru.site/Saaas%E6%9C%8D…

Document content analysis has been a crucial research area in computer vision. We present #MinerU, an open-source solution for high-precision document content extraction.
Deep dive into MinerU via the technical report: mineru.site/Saaas%E6%9C%8D…
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

We are very pleased to know that one of our users just launched a website about #MinerU! The website has deployed open-source solutions for data processing, tutoring, sharing of usage experience, etc. Welcome to join the community : mineru.site

We are very pleased to know that one of our users just launched a website about #MinerU!  The website has deployed open-source solutions for data processing, tutoring, sharing of usage experience, etc. Welcome to join the community : mineru.site
OpenDataLab (@opendatalab_ai) 's Twitter Profile Photo

The open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models. WanJuanSiLu mainly consists of eight subsets: Thai, Russian, Arabic, Korean, Hungarian, etc.

The open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.  WanJuanSiLu mainly consists of eight subsets: Thai, Russian, Arabic, Korean, Hungarian, etc.
Andrew Ng (@andrewyng) 's Twitter Profile Photo

Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.