Yuanhan (John) Zhang (@zhang_yuanhan) Twitter Tweets • TwiCopy

Yuanhan (John) Zhang

@zhang_yuanhan

a year ago

Again, llava-video shows its ability!

thumb_up_off_alt1

chat_bubble_outline8

repeat0

shareShare

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 🤗 Dataset: huggingface.co/datasets/micro… 📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by Chunyuan Li Yuanhan (John) Zhang ) 📗 Our

thumb_up_off_alt54

chat_bubble_outline4

repeat13

shareShare

Yuanhan (John) Zhang

@zhang_yuanhan

a year ago

Very interesting insights!

thumb_up_off_alt3

chat_bubble_outline14

repeat0

shareShare

Zhenyu Jiang

@stevetod1998

10 months ago

Excited to share that we’re organizing the Generative Models for Robot Learning workshop at #ICLR2025! Join us and submit your paper!

thumb_up_off_alt57

chat_bubble_outline1

repeat7

shareShare

Xiang Yue@ICLR2025🇸🇬

@xiangyue96

10 months ago

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos videommmu.github.io One significant difference of Video-MMMU is that it does not only measure the "absolute" accuracy of models but also measures the "delta" accuracy, where

thumb_up_off_alt39

chat_bubble_outline2

repeat7

shareShare

Kairui Hu

@kairuicarry

10 months ago

🚀 Updated Video-MMMU Leaderboard for Qwen-2.5-VL! Congrats to Qwen! 🎉 Qwen-2.5-VL-72B achieves GPT-4o-level performance on Video-MMMU and achieves a high ΔKnowledge, marking a significant advance among open-source models! 🏆 Qwen-2.5-VL-7B achieves SOTA performance

🚀 Updated Video-MMMU Leaderboard for Qwen-2.5-VL!

Congrats to <a href="/Alibaba_Qwen/">Qwen</a>! 🎉 Qwen-2.5-VL-72B achieves GPT-4o-level performance on Video-MMMU and achieves a high ΔKnowledge, marking a significant advance among open-source models! 🏆
Qwen-2.5-VL-7B achieves SOTA performance

thumb_up_off_alt54

chat_bubble_outline5

repeat11

shareShare

Yuanhan (John) Zhang

@zhang_yuanhan

10 months ago

Introducing Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Video-MMMU goes beyond simply assessing how video LLMs comprehend professional content. We explore how these models apply the acquired knowledge to tackle new tasks: By

thumb_up_off_alt40

chat_bubble_outline2

repeat5

shareShare

Felix Juefei Xu

@felixudr

9 months ago

📣The Second Workshop on Efficient and On-Device Generation (EDGE) at CVPR 2025 will focus on the latest advancements of generative AI in the computer vision domain, with an emphasis on efficiencies across multiple aspects. We encourage techniques that enable generative models to

thumb_up_off_alt43

chat_bubble_outline0

repeat12

shareShare

Yuanhan (John) Zhang

@zhang_yuanhan

9 months ago

Woo! Excited to see several multimodal experts on board and looking forward to exploring its multimodal capabilities!

thumb_up_off_alt2

chat_bubble_outline11

repeat0

shareShare

AIGCLINK

@aigclink

9 months ago

酷，一个日常生活AI助手：EgoLife 就像个人的私人助理，可以帮你记住重要事件、跟踪习惯、做事件回忆以及任务管理等通过Meta Aria眼镜收集数据，使用摄像头和传感器记录有两个主要组件 EgoGPT：能实时理解你在做什么，可以听懂周围的声音，自动记录重要事件

thumb_up_off_alt73

chat_bubble_outline3

repeat19

shareShare

Yuanhan (John) Zhang

@zhang_yuanhan

9 months ago

Excellent project led by Jingkang Yang @NTU🇸🇬! Come to see how your video LMM performs in egocentric life.

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

TwelveLabs (twelvelabs.io)

@twelve_labs

8 months ago

✅ Kairui Hu will present Video-MMMU - the first benchmark designed to evaluate how effectively Large Multimodal Models (LMMs) acquire knowledge from professional videos. x.com/kairuicarry/st…

thumb_up_off_alt6

chat_bubble_outline1

repeat4

shareShare

Max Forbes

@maxforbes

8 months ago

working on a post that's basically "how to get a paper accepted," using a case study one of my own that went from reject (2.5, 3, 3) to accept (4, 4.5, 4.5) with just one week of revisions

thumb_up_off_alt550

chat_bubble_outline6

repeat69

shareShare

Fabian Mentzer

@mentzer_f

8 months ago

Glad we launched => aistudio.google.com

thumb_up_off_alt4,4K

chat_bubble_outline79

repeat307

shareShare

VidLLMs CVPR2025

@vidllms

8 months ago

🚨 🚨 🚨 News 🚨 🚨 🚨 Paper Submission for the Video LLMs Workshop at #CVPR2025 is open now ! Call for papers: crcv.ucf.edu/cvpr2025-vidll… OpenReview: openreview.net/group?id=thecv… #CVPR2025 #VidLLMs #VideoLLMs #Multimodal #ai #llms

thumb_up_off_alt37

chat_bubble_outline2

repeat15

shareShare

AI at Meta

@aiatmeta

8 months ago

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

thumb_up_off_alt13,13K

chat_bubble_outline706

repeat2,2K

shareShare

Hugging Face

@huggingface

8 months ago

We are excited to partner with AI at Meta to welcome Llama 4 Maverick (402B) & Scout (109B) natively multimodal Language Models on the Hugging Face Hub with Xet 🤗 Both MoE models trained on up-to 40 Trillion tokens, pre-trained on 200 languages and significantly outperforms its

We are excited to partner with <a href="/AIatMeta/">AI at Meta</a> to welcome Llama 4 Maverick (402B) & Scout (109B) natively multimodal Language Models on the Hugging Face Hub with Xet 🤗

Both MoE models trained on up-to 40 Trillion tokens, pre-trained on 200 languages and significantly outperforms its

thumb_up_off_alt712

chat_bubble_outline25

repeat118

shareShare

Li Bo

@boli68567011

7 months ago

🚀 Introducing Aero-1-Audio — a compact yet mighty audio model. ⚡ Trained in <24h on just 16×H100 🎧 Handles 15+ min audio seamlessly 💡 Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe Aero shows: smart data > massive scale.

thumb_up_off_alt31

chat_bubble_outline1

repeat8

shareShare

Yuanhan (John) Zhang

@zhang_yuanhan

6 months ago

Impressive and solid work from Kunchang Kunchang Li. Looking forward its video understanding ability in the future.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

penghao wu

@penghaowu2

6 months ago

🧵[1/n] Our #ICML2025 paper, Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM, is now on arXiv! Orthogonal to token reduction approaches, we study the computation-level redundancy on vision tokens within decoder LMM. Paper Link: arxiv.org/abs/2505.15816

thumb_up_off_alt25

chat_bubble_outline1

repeat4

shareShare

Yuanhan (John) Zhang

Yuanhan (John) Zhang

Mu Cai

Yuanhan (John) Zhang

Zhenyu Jiang

Xiang Yue@ICLR2025🇸🇬

Kairui Hu

Yuanhan (John) Zhang

Felix Juefei Xu

Yuanhan (John) Zhang

AIGCLINK

Yuanhan (John) Zhang

TwelveLabs (twelvelabs.io)

Max Forbes

Fabian Mentzer

VidLLMs CVPR2025

AI at Meta

Hugging Face

Li Bo

Yuanhan (John) Zhang

penghao wu