Xiaohan Wang (@xiaohanwang96) 's Twitter Profile
Xiaohan Wang

@xiaohanwang96

Postdoc @Stanford. Video Understanding, Multimodal Learning, and AI for Healthcare

ID: 794974781435244544

linkhttps://wxh1996.github.io/ calendar_today05-11-2016 18:48:52

49 Tweet

227 Followers

304 Following

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Another Microsoft paper revealing the size of GPT-4, GPT-o1 and Claude Sonnet. I'm not sure how trustworthy these numbers are, but they do make a lot of sense to me. Source: arxiv.org/pdf/2412.19260

Another Microsoft paper revealing the size of GPT-4, GPT-o1 and Claude Sonnet. 
I'm not sure how trustworthy these numbers are, but they do make a lot of sense to me.
Source: arxiv.org/pdf/2412.19260
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

🔍 Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions! Key findings: 1️⃣ Current open-ended VQA eval methods are flawed: rule-based metrics correlate

🔍 Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions!

Key findings:

1️⃣ Current open-ended VQA eval methods are flawed: rule-based metrics correlate
Alejandro Lozano (@ale9806_) 's Twitter Profile Photo

Biomedical datasets are often confined to specific domains, missing valuable insights from adjacent fields. To bridge this gap, we present BIOMEDICA: an open-source framework to extract and serialize PMC-OA. 📄Paper: lnkd.in/dUUgA6rR 🌐Website: lnkd.in/dnqZZW4M

Biomedical datasets are often confined to specific domains, missing valuable insights from adjacent fields. To bridge this gap, we present BIOMEDICA: an open-source framework to extract and serialize PMC-OA.

📄Paper: lnkd.in/dUUgA6rR 
🌐Website: lnkd.in/dnqZZW4M
Junyang Lin (@justinlin610) 's Twitter Profile Photo

Qwen2.5-VL! Qwen2.5-VL! Qwen2.5-VL! Try our new Qwen2.5-VL in Qwen Chat, chat.qwenlm.ai Finally, after months, we release the new version of our vision language model, Qwen2.5-VL! This time, we focus on more essential problems. Notably, we highlight the importance of

Christopher Manning (@chrmanning) 's Twitter Profile Photo

Re: “Every major breakthrough in AI has been American”: America does itself no favors when it overestimates its specialness. Yes, the center of the AI industry is the US (California!), but many of the breakthroughs of (neural, gradient-based) AI happened elsewhere: • LSTMs,

Orr Zohar @ ICLR’25 (@orr_zohar) 's Twitter Profile Photo

🚨🚨🚨SmolVLM2 is here - and it's a tiny titan! This nano-sized model crushes image and video perception👁️🧠, all while being small enough to run on your iPhone, bringing cutting-edge multimodal AI to every device📲. No more cloud dependence! Your data is yours! #MobileAI

Serena Yeung-Levy (@yeung_levy) 's Twitter Profile Photo

Just published in Science Advances, our work demonstrating the ability of AI and 3D computer vision to produce automated measurement of human interactions in video data from early child development research -- providing over 100x time savings compared to human annotation and

James Burgess (at ICLR 2025) (@jmhb0) 's Twitter Profile Photo

🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵

Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

Excited to announce that AutoConverter has been accepted to #CVPR2025 and VMCBench is now supported by both VLMEvalKit and lmms-eval! 🎉 Try our tools: ▪️ AutoConverter demo: yuhui-zh15.github.io/AutoConverter-… ▪️ VMCBench: huggingface.co/datasets/suyc2… (supported by VLMEvalKit and lmms-eval)

Excited to announce that AutoConverter has been accepted to #CVPR2025 and VMCBench is now supported by both VLMEvalKit and lmms-eval! 🎉

Try our tools: 
▪️ AutoConverter demo: yuhui-zh15.github.io/AutoConverter-…
▪️ VMCBench: huggingface.co/datasets/suyc2… (supported by VLMEvalKit and lmms-eval)
James Burgess (at ICLR 2025) (@jmhb0) 's Twitter Profile Photo

Introducing MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research #CVPR2025 ✅ 1k multimodal reasoning VQAs testing MLLMs for science 🧑‍🔬 Biology researchers manually created the questions 🤖 RefineBot: a method for fixing QA language shortcuts 🧵

Introducing MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research #CVPR2025 

✅ 1k multimodal reasoning VQAs testing MLLMs for science
🧑‍🔬 Biology researchers manually created the questions 
🤖 RefineBot: a method for fixing QA language shortcuts
🧵
Xiaohan Wang (@xiaohanwang96) 's Twitter Profile Photo

🚨 Excited to co-organize our #CVPR2025 workshop on "Multimodal Foundation Models for Biomedicine: Challenges and Opportunities" — where vision, language, and health intersect! We’re bringing together experts from #CV, #NLP, and #healthcare to explore: 🧠 Technical challenges (e.g.

🚨 Excited to co-organize our <a href="/CVPR/">#CVPR2025</a> workshop on "Multimodal Foundation Models for Biomedicine: Challenges and Opportunities" — where vision, language, and health intersect!
We’re bringing together experts from #CV, #NLP, and #healthcare to explore:
 🧠 Technical challenges (e.g.
Orr Zohar @ ICLR’25 (@orr_zohar) 's Twitter Profile Photo

🤗The SmolVLM report is out, with all the experiments, findings, and insights that led to high performance at tiny sizes🤏. 📱These models can run on most mobile/edge devices. 📖Give it a look!

🤗The SmolVLM report is out, with all the experiments, findings, and insights that led to high performance at tiny sizes🤏. 
📱These models can run on most mobile/edge devices. 
📖Give it a look!
Orr Zohar @ ICLR’25 (@orr_zohar) 's Twitter Profile Photo

Excited to present Video-STaR at #ICLR2025’s poster session tomorrow! 🗓️ Visit me at Poster 91, 10:00 AM–12:30 PM 🚀 Dive into our work on advancing video reasoning using self-training:

Excited to present Video-STaR at #ICLR2025’s poster session tomorrow!
🗓️ Visit me at Poster 91, 10:00 AM–12:30 PM
🚀 Dive into our work on advancing video reasoning using self-training:
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

📢 The First Workshop on Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) at #CVPR2025 is still accepting submissions until May 7, 11:59 PM PT! Join speakers from Stanford, Google, MIT & more exploring the intersection of #CV, #NLP & #healthcare. Submit your 4-page

📢 The First Workshop on Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) at #CVPR2025 is still accepting submissions until May 7, 11:59 PM PT! 
Join speakers from Stanford, Google, MIT &amp; more exploring the intersection of #CV, #NLP &amp; #healthcare.
Submit your 4-page
Benjamin Feuer (@feuerbenjamin) 's Twitter Profile Photo

So excited to announce the DCVLR (Data Curation for Vision-Language Reasoning) competition at NeurIPS 2025, led by Oumi and sponsored by Lambda! 🌟open-data 🌟 🤖 open-models 🤖 💻 open-source 💻 💪anyone can compete for free 💪 dcvlr-neurips.github.io 🧵 1 / n

Xiaohan Wang (@xiaohanwang96) 's Twitter Profile Photo

🧠 How can we truly test long-context video understanding in video-LMMs? ⏱️ TimeScope benchmarks models from 1 min to 8 hours using “needle-in-a-haystack” probes. 🚀 Gemini 2.5-Pro leads the pack—but even it struggles as context length grows. Long-range memory is still a