Yongshuo Zong (@yongshuozong) 's Twitter Profile
Yongshuo Zong

@yongshuozong

PhD Student @ University of Edinburgh

ID: 1330469385778192385

linkhttp://ys-zong.github.io calendar_today22-11-2020 11:13:20

65 Tweet

112 Followers

263 Following

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Evaluating the capabilities of multimodal in-context learning of #VLLMs? You can do better than VQA and captioning! Introducing *VL-ICL Bench* for both image-to-text and text-to-image #ICL. Project page: ys-zong.github.io/VL-ICL/

Evaluating the capabilities of multimodal in-context learning of #VLLMs? You can do better than VQA and captioning! Introducing  *VL-ICL Bench* for both image-to-text and text-to-image #ICL. 
Project page: ys-zong.github.io/VL-ICL/
Ondrej Bohdal (@obohdal) 's Twitter Profile Photo

Curious about how to better evaluate in-context learning in multimodal #LLMs? We introduce VL-ICL Bench to enable rigorous evaluation of MLLM's ability to learn from a few examples✨. Details at ys-zong.github.io/VL-ICL

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

VLGuard is accepted to #ICML2024! Check out our strong baseline for 🛡️safeguarding🛡️ VLLMs: ys-zong.github.io/VLGuard/

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Prompting for MMLU, and other MCQ benchmarks matters a lot. Our #ICML2024 paper finds that even if you only permute the *order of the options*, the predictions can be completely different. TL;DR, we need better evaluations. arxiv.org/abs/2310.01651

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Just tried out Phi-3 Vision and it's really good! Also great to see Phi-3-V uses our *VLGuard* as one of the main benchmarks for safety evaluation🛡️. We welcome more users to try out VLGuard for safeguarding and evaluating VLLMs before deployment! Paper: arxiv.org/abs/2402.02207

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Late to the party but I'll be ICML Conference from 24th to 27th presenting two main papers on Thursday (safety, robustness) and a workshop paper (long-context) on Friday about *vision-language models*! Do stop by my posters and look forward to meeting old and new friends🥳

Late to the party but I'll be <a href="/icmlconf/">ICML Conference</a> from 24th to 27th presenting two main papers on Thursday (safety, robustness) and a workshop paper (long-context) on Friday about *vision-language models*!

Do stop by my posters and look forward to meeting old and new friends🥳
Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Our survey is finally accepted to T-PAMI! We don't have too many fancy hypes in the paper🙅‍♂️ We focused on clearer categorizations that summarize previous efforts (pre-LLM era) and also offer insights for multimodal LLMs. Newest version here: arxiv.org/abs/2304.01008

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Excited to share that I finally started my internship Amazon Web Services this week in Bellevue! Looking forward to catching up with old and new friends in the Seattle/Bellevue area. Let’s connect!

elvis (@omarsar0) 's Twitter Profile Photo

This paper provides a preliminary exploration of the o1 model in medical scenarios. Strength: o1 surpasses the previous GPT-4 in accuracy by an average of 6.2% and 6.6% across 19 datasets and two newly created complex QA scenarios. Weakness: Identifies hallucination,

This paper provides a preliminary exploration of the o1 model in medical scenarios.

Strength: o1 surpasses the previous GPT-4 in accuracy by an average of 6.2% and 6.6% across 19 datasets and two newly created complex QA scenarios.

Weakness: Identifies hallucination,
yuyin zhou@ICLR'25 (@yuyinzhou_cs) 's Twitter Profile Photo

OpenAI’s new o1(-preview) model has shown impressive reasoning capabilities across various general NLP tasks, but how does it hold up in the medical domain? A big thank you to Open Life Science AI for sharing our latest research, *"A Preliminary Study of o1 in Medicine: Are We Getting

Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Interesting finding! Inference scaling sometimes indeed hurts performance—when we permute multiple-choice options and do majority voting, many models perform worse than with original single inputs for "hard" questions. More details in our ICML paper: arxiv.org/abs/2310.01651

Zhuang Liu (@liuzhuang1234) 's Twitter Profile Photo

When you click into the link you know it.. I was just watching Ilya's talk on seq2seq at NeurIPS 2014. Highly recommend youtube.com/watch?v=-uyXE7… Two quotes that I remembered from the talk, still more true than ever: 1. "We use minimum innovation for maximum results" 2. "If you

Jiao Sun (@sunjiao123sun_) 's Twitter Profile Photo

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans! 

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a> 

We have ethical reviews for authors, but missed it for invited speakers? 😡
Yongshuo Zong (@yongshuozong) 's Twitter Profile Photo

Our VL-ICL bench is accepted to ICLR 2026! It's been almost a year since we developed it yet state-of-the-art VLMs still struggle on learning in-context. Great to work with Ondrej Bohdal and Timothy Hospedales.

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Vision-Language Models struggle to ground complex instructions precisely at the pixel level. This paper introduces Ground-V, a new dataset automatically generated, to teach VLMs how to handle complex instructions for accurate pixel grounding. Methods 🔧: → Ground-V dataset

Vision-Language Models struggle to ground complex instructions precisely at the pixel level.

This paper introduces Ground-V, a new dataset automatically generated, to teach VLMs how to handle complex instructions for accurate pixel grounding.

Methods 🔧:

→ Ground-V dataset