OpenGVLab (@opengvlab) 's Twitter Profile
OpenGVLab

@opengvlab

Shanghai AI Lab, General Vision Team. We created InternImage, BEVFormer, VideoMAE, LLaMA-Adapter, Ask-Anything, and many more! [email protected]

ID: 1610948392489979904

linkhttps://github.com/OpenGVLab calendar_today05-01-2023 10:36:53

133 Tweet

1,1K Followers

87 Following

OpenGVLab (@opengvlab) 's Twitter Profile Photo

CharXiv is Zirui "Colin" Wang 's excellent work in evaluating the chart understanding ability of #mllm. InternVL2-Llama3-76B is the best open-source model for this domain. BTW the song that summarizes the key findings is creative! I love it! 👍CharXiv leaderbord and the song:

CharXiv is <a href="/zwcolin/">Zirui "Colin" Wang</a> 's excellent work in evaluating the chart understanding ability of #mllm. InternVL2-Llama3-76B is the best open-source model for this domain. BTW the song that summarizes the key findings is creative!  I love it!
👍CharXiv leaderbord and the song:
OpenGVLab (@opengvlab) 's Twitter Profile Photo

Flexible photo-realistic image and vision-language generalist using a simple decoder-only transformer! #GenAI model LUMINA-mGPT's demo video is on YouTube ! 📺youtu.be/YqNc8Y-cCs0?si… 🚀Code: github.com/Alpha-VLLM/Lum… Paper: arxiv.org/abs/2408.02657

OpenGVLab (@opengvlab) 's Twitter Profile Photo

Here comes the Mini-InternVL 2.0 ! 🚀With just 5% of the parameters, it delivers 90% performance! arxiv👏: arxiv.org/abs/2410.16261 repos👉: github.com/OpenGVLab/Inte… 1B version🤗: huggingface.co/OpenGVLab/Inte… 2B version🤗: huggingface.co/OpenGVLab/Inte… 4B version🤗:

OpenGVLab (@opengvlab) 's Twitter Profile Photo

The tech report is worth reading. It reveals many details about how InternVL 1.5, InternVL 2.0, and now InternVL 2.5 can be the best open-source #vlm foundation model all the time. huggingface.co/papers/2412.05…

OpenGVLab (@opengvlab) 's Twitter Profile Photo

🥳We have released InternVL2.5, ranging from 1B to 78B, on Hugging Face . 😉InternVL2_5-78B is the first open-source #MLLM to achieve over 70% on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o. 🤗HF Space:

🥳We have released InternVL2.5, ranging from 1B to 78B, on <a href="/huggingface/">Hugging Face</a> .

😉InternVL2_5-78B is the first open-source #MLLM to achieve over 70% on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o.

🤗HF Space:
OpenGVLab (@opengvlab) 's Twitter Profile Photo

We have reached a milestone by exceeding human performance on the R2R dataset in vision-language navigation for the very first time.

OpenGVLab (@opengvlab) 's Twitter Profile Photo

People pay more and more attention on the quality or details of generated videos. Using a single hand-tuning temperature parameter to enhance your generated video for free! Nice work with our amazing friends Yang Luo Xuanlei Zhao, Wenqi Shaw, Victor.Kai Wang, VITA Group,

OpenGVLab (@opengvlab) 's Twitter Profile Photo

🥳Mini-InternVL has been accepted by Visual Intelligence! The Mini-InternVL series of #MLLMs, with parameter ranges from 1 B to 4 B, achieve 90% of the performance using only 5% of the parameters. This significant efficiency and performance boost makes our model more accessible

🥳Mini-InternVL has been accepted by Visual Intelligence! The Mini-InternVL series of #MLLMs, with parameter ranges from 1 B to 4 B, achieve 90% of the performance using only 5% of the parameters. This significant efficiency and performance boost makes our model more accessible
OpenGVLab (@opengvlab) 's Twitter Profile Photo

🚀 Introducing #InternVideo 2.5 - The Video Multimodal AI That Sees Longer & Smarter! ✨ Handles videos 6x longer than predecessors ✨ Pinpoints objects/actions with surgical precision ✨ Trained on 300K+ hours of diverse video data 📈 Outperforms SOTA on multiple benchmarks &

🚀 Introducing #InternVideo 2.5 - The Video Multimodal AI That Sees Longer &amp; Smarter! 
✨ Handles videos 6x longer than predecessors 
✨ Pinpoints objects/actions with surgical precision ✨ Trained on 300K+ hours of diverse video data

📈 Outperforms SOTA on multiple benchmarks &amp;
OpenGVLab (@opengvlab) 's Twitter Profile Photo

🚀 Introducing MM-Eureka Series - A Breakthrough in Multimodal Reasoning with Visual Aha Moments! ✨ Reproduced R1-Zero and Visual Aha-Moment Phenomena 🧠 Trained on only 0.05% of the data used for base models, it achieves comparable benchmark math reasoning performance to

🚀 Introducing MM-Eureka Series - A Breakthrough in Multimodal Reasoning with Visual Aha Moments!
✨ Reproduced R1-Zero and Visual Aha-Moment Phenomena
🧠 Trained on only 0.05% of the data used for base models, it achieves comparable benchmark math reasoning performance to
OpenGVLab (@opengvlab) 's Twitter Profile Photo

🥳We have released #InternVL3, an advanced #MLLM series ranging from 1B to 78B, on Hugging Face. 😉InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new SOTA among open-source MLLMs. ☺️Highlights: - Native multimodal pre-training: Simultaneous language and

🥳We have released #InternVL3, an advanced #MLLM series ranging from 1B to 78B, on <a href="/huggingface/">Hugging Face</a>.

😉InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new SOTA among open-source MLLMs.

☺️Highlights:
- Native multimodal pre-training: Simultaneous language and