Yuanhan (John) Zhang (@zhang_yuanhan) 's Twitter Profile
Yuanhan (John) Zhang

@zhang_yuanhan

The 3rd year Ph.D. @MMLabNTU

ID: 1302042473116495874

linkhttps://zhangyuanhan-ai.github.io/ calendar_today05-09-2020 00:35:05

210 Tweet

862 Followers

252 Following

Mu Cai (@mucai7) 's Twitter Profile Photo

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 🤗 Dataset: huggingface.co/datasets/micro… 📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by Chunyuan Li Yuanhan (John) Zhang ) 📗 Our

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 

🤗 Dataset: huggingface.co/datasets/micro…
📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by <a href="/ChunyuanLi/">Chunyuan Li</a> <a href="/zhang_yuanhan/">Yuanhan (John) Zhang</a> )
📗 Our
Zhenyu Jiang (@stevetod1998) 's Twitter Profile Photo

Excited to share that we’re organizing the Generative Models for Robot Learning workshop at #ICLR2025! Join us and submit your paper!

Xiang Yue@ICLR2025🇸🇬 (@xiangyue96) 's Twitter Profile Photo

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos videommmu.github.io One significant difference of Video-MMMU is that it does not only measure the "absolute" accuracy of models but also measures the "delta" accuracy, where

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
videommmu.github.io

One significant difference of Video-MMMU is that it does not only measure the "absolute" accuracy of models but also measures the "delta" accuracy, where
Kairui Hu (@kairuicarry) 's Twitter Profile Photo

🚀 Updated Video-MMMU Leaderboard for Qwen-2.5-VL! Congrats to Qwen! 🎉 Qwen-2.5-VL-72B achieves GPT-4o-level performance on Video-MMMU and achieves a high ΔKnowledge, marking a significant advance among open-source models! 🏆 Qwen-2.5-VL-7B achieves SOTA performance

🚀 Updated Video-MMMU Leaderboard for Qwen-2.5-VL!

Congrats to <a href="/Alibaba_Qwen/">Qwen</a>! 🎉 Qwen-2.5-VL-72B achieves GPT-4o-level performance on Video-MMMU and achieves a high ΔKnowledge, marking a significant advance among open-source models! 🏆
Qwen-2.5-VL-7B achieves SOTA performance
Yuanhan (John) Zhang (@zhang_yuanhan) 's Twitter Profile Photo

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Video-MMMU goes beyond simply assessing how video LLMs comprehend professional content. We explore how these models apply the acquired knowledge to tackle new tasks: By

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos 

Video-MMMU goes beyond simply assessing how video LLMs comprehend professional content. 

We explore how these models apply the acquired knowledge to tackle new tasks: By
Felix Juefei Xu (@felixudr) 's Twitter Profile Photo

📣The Second Workshop on Efficient and On-Device Generation (EDGE) at CVPR 2025 will focus on the latest advancements of generative AI in the computer vision domain, with an emphasis on efficiencies across multiple aspects. We encourage techniques that enable generative models to

📣The Second Workshop on Efficient and On-Device Generation (EDGE) at CVPR 2025 will focus on the latest advancements of generative AI in the computer vision domain, with an emphasis on efficiencies across multiple aspects. We encourage techniques that enable generative models to
AIGCLINK (@aigclink) 's Twitter Profile Photo

酷,一个日常生活AI助手:EgoLife 就像个人的私人助理,可以帮你记住重要事件、跟踪习惯、做事件回忆以及任务管理等 通过Meta Aria眼镜收集数据,使用摄像头和传感器记录 有两个主要组件 EgoGPT:能实时理解你在做什么,可以听懂周围的声音,自动记录重要事件

TwelveLabs (twelvelabs.io) (@twelve_labs) 's Twitter Profile Photo

Kairui Hu will present Video-MMMU - the first benchmark designed to evaluate how effectively Large Multimodal Models (LMMs) acquire knowledge from professional videos. x.com/kairuicarry/st…

Max Forbes (@maxforbes) 's Twitter Profile Photo

working on a post that's basically "how to get a paper accepted," using a case study one of my own that went from reject (2.5, 3, 3) to accept (4, 4.5, 4.5) with just one week of revisions

working on a post that's basically "how to get a paper accepted," using a case study one of my own that went from reject (2.5, 3, 3) to accept (4, 4.5, 4.5) with just one week of revisions
VidLLMs CVPR2025 (@vidllms) 's Twitter Profile Photo

🚨 🚨 🚨 News 🚨 🚨 🚨 Paper Submission for the Video LLMs Workshop at #CVPR2025 is open now ! Call for papers: crcv.ucf.edu/cvpr2025-vidll… OpenReview: openreview.net/group?id=thecv… #CVPR2025 #VidLLMs #VideoLLMs #Multimodal #ai #llms

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
Hugging Face (@huggingface) 's Twitter Profile Photo

We are excited to partner with AI at Meta to welcome Llama 4 Maverick (402B) & Scout (109B) natively multimodal Language Models on the Hugging Face Hub with Xet 🤗 Both MoE models trained on up-to 40 Trillion tokens, pre-trained on 200 languages and significantly outperforms its

We are excited to partner with <a href="/AIatMeta/">AI at Meta</a> to welcome Llama 4 Maverick (402B) &amp; Scout (109B) natively multimodal Language Models on the Hugging Face Hub with Xet 🤗

Both MoE models trained on up-to 40 Trillion tokens, pre-trained on 200 languages and significantly outperforms its
Li Bo (@boli68567011) 's Twitter Profile Photo

🚀 Introducing Aero-1-Audio — a compact yet mighty audio model. ⚡ Trained in <24h on just 16×H100 🎧 Handles 15+ min audio seamlessly 💡 Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe Aero shows: smart data > massive scale.

penghao wu (@penghaowu2) 's Twitter Profile Photo

🧵[1/n] Our #ICML2025 paper, Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM, is now on arXiv! Orthogonal to token reduction approaches, we study the computation-level redundancy on vision tokens within decoder LMM. Paper Link: arxiv.org/abs/2505.15816

🧵[1/n] Our #ICML2025 paper, Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM, is now on arXiv! Orthogonal to token reduction approaches, we study the computation-level redundancy on vision tokens within decoder LMM.
Paper Link: arxiv.org/abs/2505.15816