Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile
Yuhui Zhang

@zhang_yu_hui

CS PhD @ Stanford

ID: 969422731748950018

linkhttps://cs.stanford.edu/~yuhuiz calendar_today02-03-2018 04:02:45

90 Tweet

660 Followers

176 Following

Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

๐Ÿค” Why are VLMs (even GPT-4V) worse at image classification than CLIP, despite using CLIP as their vision encoder? Presenting VLMClassifier at #NeurIPS2024: โฐ Dec 11 (Wed), 11:00-14:00 ๐Ÿ“ East Hall #3710 Key findings: 1๏ธโƒฃ VLMs dramatically underperform CLIP (>20% gap) 2๏ธโƒฃ After

๐Ÿค” Why are VLMs (even GPT-4V) worse at image classification than CLIP, despite using CLIP as their vision encoder?

Presenting VLMClassifier at #NeurIPS2024:
โฐ Dec 11 (Wed), 11:00-14:00
๐Ÿ“ East Hall #3710

Key findings:
1๏ธโƒฃ VLMs dramatically underperform CLIP (>20% gap)
2๏ธโƒฃ After
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

๐Ÿ” Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions! Key findings: 1๏ธโƒฃ Current open-ended VQA eval methods are flawed: rule-based metrics correlate

๐Ÿ” Vision language models are getting better - but how do we evaluate them reliably? Introducing AutoConverter: transforming open-ended VQA into challenging multiple-choice questions!

Key findings:

1๏ธโƒฃ Current open-ended VQA eval methods are flawed: rule-based metrics correlate
Alejandro Lozano (@ale9806_) 's Twitter Profile Photo

Biomedical datasets are often confined to specific domains, missing valuable insights from adjacent fields. To bridge this gap, we present BIOMEDICA: an open-source framework to extract and serialize PMC-OA. ๐Ÿ“„Paper: lnkd.in/dUUgA6rR ๐ŸŒWebsite: lnkd.in/dnqZZW4M

Biomedical datasets are often confined to specific domains, missing valuable insights from adjacent fields. To bridge this gap, we present BIOMEDICA: an open-source framework to extract and serialize PMC-OA.

๐Ÿ“„Paper: lnkd.in/dUUgA6rR 
๐ŸŒWebsite: lnkd.in/dnqZZW4M
Xiaohan Wang (@xiaohanwang96) 's Twitter Profile Photo

๐Ÿš€ Introducing Temporal Preference Optimization (TPO) โ€“ a video-centric post-training framework that enhances temporal grounding in long-form videos for Video-LMMs! ๐ŸŽฅโœจ ๐Ÿ” Key Highlights: โœ… Self-improvement via preference learning โ€“ Models learn to differentiate well-grounded

๐Ÿš€ Introducing Temporal Preference Optimization (TPO) โ€“ a video-centric post-training framework that enhances temporal grounding in long-form videos for Video-LMMs! ๐ŸŽฅโœจ

๐Ÿ” Key Highlights:
โœ… Self-improvement via preference learning โ€“ Models learn to differentiate well-grounded
Avinab Saha ๐Ÿ‡ฎ๐Ÿ‡ณ (@avinab_saha) 's Twitter Profile Photo

๐Ÿš€ Annnouncing XAI4CV Workshop at #CVPR2025! We look forward to gathering experts to explore challenges and opportunities in XAI for CV, advance new ideas, and push the field to its limits! Join us in Nashville, TN, this June. ๐Ÿ”— xai4cv.github.io #CVPR2025 #XAI #CVPR2026

Sukrut Rao (@sukrutrao) 's Twitter Profile Photo

Submit your latest work (papers, demos) in #XAI to the 4th Explainable AI for Computer Vision (XAI4CV) Workshop at #CVPR2025! The deadline for the Proceedings Track is March 10, 2025 Details: xai4cv.github.io/workshop_cvpr25 Submission Site: cmt3.research.microsoft.com/XAI4CV2025 #CVPR2025 Explainable AI

Submit your latest work (papers, demos) in #XAI to the 4th Explainable AI for Computer Vision (XAI4CV) Workshop at #CVPR2025!

The deadline for the Proceedings Track is March 10, 2025

Details: xai4cv.github.io/workshop_cvpr25
Submission Site: cmt3.research.microsoft.com/XAI4CV2025

<a href="/CVPR/">#CVPR2025</a> <a href="/XAI_Research/">Explainable AI</a>
James Burgess (at ICLR 2025) (@jmhb0) 's Twitter Profile Photo

๐ŸšจLarge video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine youโ€™re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? ๐Ÿš€ Introducing "Video Action Differencing" (VidDiff), ICLR 2025 ๐Ÿงต

Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

Excited to announce that AutoConverter has been accepted to #CVPR2025 and VMCBench is now supported by both VLMEvalKit and lmms-eval! ๐ŸŽ‰ Try our tools: โ–ช๏ธ AutoConverter demo: yuhui-zh15.github.io/AutoConverter-โ€ฆ โ–ช๏ธ VMCBench: huggingface.co/datasets/suyc2โ€ฆ (supported by VLMEvalKit and lmms-eval)

Excited to announce that AutoConverter has been accepted to #CVPR2025 and VMCBench is now supported by both VLMEvalKit and lmms-eval! ๐ŸŽ‰

Try our tools: 
โ–ช๏ธ AutoConverter demo: yuhui-zh15.github.io/AutoConverter-โ€ฆ
โ–ช๏ธ VMCBench: huggingface.co/datasets/suyc2โ€ฆ (supported by VLMEvalKit and lmms-eval)
Xiaohan Wang (@xiaohanwang96) 's Twitter Profile Photo

๐Ÿšจ Excited to co-organize our #CVPR2025 workshop on "Multimodal Foundation Models for Biomedicine: Challenges and Opportunities" โ€” where vision, language, and health intersect! Weโ€™re bringing together experts from #CV, #NLP, and #healthcare to explore: ๐Ÿง  Technical challenges (e.g.

๐Ÿšจ Excited to co-organize our <a href="/CVPR/">#CVPR2025</a> workshop on "Multimodal Foundation Models for Biomedicine: Challenges and Opportunities" โ€” where vision, language, and health intersect!
Weโ€™re bringing together experts from #CV, #NLP, and #healthcare to explore:
 ๐Ÿง  Technical challenges (e.g.
Anjiang Wei (@anjiangw) 's Twitter Profile Photo

๐Ÿšจ New benchmark drop: EquiBench ๐Ÿšจ We introduce equivalence checking as a rigorous test of LLMsโ€™ code reasoning ability, featuring 4 languages, 6 categories, and 2,400 program pairs. Top models still struggle with this task. ๐Ÿ”— Website: anjiang-wei.github.io/EquiBench-Websโ€ฆ ๐Ÿ“ Preprint:

๐Ÿšจ New benchmark drop: EquiBench ๐Ÿšจ

We introduce equivalence checking as a rigorous test of LLMsโ€™ code reasoning ability, featuring 4 languages, 6 categories, and 2,400 program pairs.

Top models still struggle with this task.

๐Ÿ”— Website: anjiang-wei.github.io/EquiBench-Websโ€ฆ
๐Ÿ“ Preprint:
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

Three papers being presented by my amazing collaborators at #ICLR2025! ๐ŸŒŸ (sadly I can't make it) 1. Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations ๐Ÿ” A deep dive into mechanistic interpretation techniques for VLMs & future

Three papers being presented by my amazing collaborators at #ICLR2025! ๐ŸŒŸ (sadly I can't make it)

1. Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations ๐Ÿ”
   
    A deep dive into mechanistic interpretation techniques for VLMs &amp; future
James Burgess (at ICLR 2025) (@jmhb0) 's Twitter Profile Photo

I'm at #ICLR2025 presenting "Video Action Differencing". Keen to chat with anyone interested in MLLMs - both for general data & for scientific reasoning

Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

๐Ÿ“ข The First Workshop on Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) at #CVPR2025 is still accepting submissions until May 7, 11:59 PM PT! Join speakers from Stanford, Google, MIT & more exploring the intersection of #CV, #NLP & #healthcare. Submit your 4-page

๐Ÿ“ข The First Workshop on Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) at #CVPR2025 is still accepting submissions until May 7, 11:59 PM PT! 
Join speakers from Stanford, Google, MIT &amp; more exploring the intersection of #CV, #NLP &amp; #healthcare.
Submit your 4-page
Thao Nguyen (@thao_nguyen26) 's Twitter Profile Photo

๐Ÿ“ข Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! ๐Ÿ“… Deadline: May 24, AoE ๐Ÿ”— Website: dataworldicml2025.github.io We have an amazing lineup of speakers + panelists from various institutions and application areas.

๐Ÿ“ข Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains!

๐Ÿ“… Deadline: May 24, AoE
๐Ÿ”— Website: dataworldicml2025.github.io

We have an amazing lineup of speakers + panelists from various institutions and application areas.
Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

๐Ÿ“ข Really excited to host the Data Curation for Vision Language Reasoning Challenge (DCVLR) @ NeurIPS 2025 and to include VMCBench as one of the evaluation sets! Weโ€™re looking forward to seeing the top solutions (with prize money!) โ€” huge thanks to Benjamin Feuer and the team for

Yuhui Zhang (@zhang_yu_hui) 's Twitter Profile Photo

Join us on Saturday at West 208-209 for our ICML Conference workshop on data-centric AI! โœจ Looking forward to great discussions and meeting both old and new friends!