Yongshuo Zong (@yongshuozong) Twitter Tweets • TwiCopy

Yongshuo Zong

2 years ago

Evaluating the capabilities of multimodal in-context learning of #VLLMs? You can do better than VQA and captioning! Introducing *VL-ICL Bench* for both image-to-text and text-to-image #ICL. Project page: ys-zong.github.io/VL-ICL/

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Ondrej Bohdal

@obohdal

2 years ago

Curious about how to better evaluate in-context learning in multimodal #LLMs? We introduce VL-ICL Bench to enable rigorous evaluation of MLLM's ability to learn from a few examples✨. Details at ys-zong.github.io/VL-ICL

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Yongshuo Zong

@yongshuozong

2 years ago

VLGuard is accepted to #ICML2024! Check out our strong baseline for 🛡️safeguarding🛡️ VLLMs: ys-zong.github.io/VLGuard/

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Yongshuo Zong

@yongshuozong

2 years ago

Prompting for MMLU, and other MCQ benchmarks matters a lot. Our #ICML2024 paper finds that even if you only permute the *order of the options*, the predictions can be completely different. TL;DR, we need better evaluations. arxiv.org/abs/2310.01651

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Just tried out Phi-3 Vision and it's really good! Also great to see Phi-3-V uses our *VLGuard* as one of the main benchmarks for safety evaluation🛡️. We welcome more users to try out VLGuard for safeguarding and evaluating VLLMs before deployment! Paper: arxiv.org/abs/2402.02207

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Great to see researchers still like our MEDFAIR paper and code!🤞 arxiv.org/abs/2210.01725

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Late to the party but I'll be ICML Conference from 24th to 27th presenting two main papers on Thursday (safety, robustness) and a workshop paper (long-context) on Friday about *vision-language models*! Do stop by my posters and look forward to meeting old and new friends🥳

Late to the party but I'll be <a href="/icmlconf/">ICML Conference</a> from 24th to 27th presenting two main papers on Thursday (safety, robustness) and a workshop paper (long-context) on Friday about *vision-language models*!

Do stop by my posters and look forward to meeting old and new friends🥳

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Our survey is finally accepted to T-PAMI! We don't have too many fancy hypes in the paper🙅‍♂️ We focused on clearer categorizations that summarize previous efforts (pre-LLM era) and also offer insights for multimodal LLMs. Newest version here: arxiv.org/abs/2304.01008

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Excited to share that I finally started my internship Amazon Web Services this week in Bellevue! Looking forward to catching up with old and new friends in the Seattle/Bellevue area. Let’s connect!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

elvis

@omarsar0

a year ago

This paper provides a preliminary exploration of the o1 model in medical scenarios. Strength: o1 surpasses the previous GPT-4 in accuracy by an average of 6.2% and 6.6% across 19 datasets and two newly created complex QA scenarios. Weakness: Identifies hallucination,

thumb_up_off_alt369

chat_bubble_outline12

repeat82

shareShare

yuyin zhou@ICLR'25

@yuyinzhou_cs

a year ago

OpenAI’s new o1(-preview) model has shown impressive reasoning capabilities across various general NLP tasks, but how does it hold up in the medical domain? A big thank you to Open Life Science AI for sharing our latest research, *"A Preliminary Study of o1 in Medicine: Are We Getting

thumb_up_off_alt31

chat_bubble_outline0

repeat8

shareShare

Yongshuo Zong

@yongshuozong

a year ago

Interesting finding! Inference scaling sometimes indeed hurts performance—when we permute multiple-choice options and do majority voting, many models perform worse than with original single inputs for "hard" questions. More details in our ICML paper: arxiv.org/abs/2310.01651

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Zhuang Liu

@liuzhuang1234

a year ago

When you click into the link you know it.. I was just watching Ilya's talk on seq2seq at NeurIPS 2014. Highly recommend youtube.com/watch?v=-uyXE7… Two quotes that I remembered from the talk, still more true than ever: 1. "We use minimum innovation for maximum results" 2. "If you

thumb_up_off_alt188

chat_bubble_outline4

repeat10

shareShare

Jiao Sun

@sunjiao123sun_

a year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Yongshuo Zong

@yongshuozong

10 months ago

Our VL-ICL bench is accepted to ICLR 2026! It's been almost a year since we developed it yet state-of-the-art VLMs still struggle on learning in-context. Great to work with Ondrej Bohdal and Timothy Hospedales.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Flood Sung

@roteksong

10 months ago

My Long CoT of Long CoT

thumb_up_off_alt105

chat_bubble_outline4

repeat20

shareShare

DeepSeek

@deepseek_ai

9 months ago

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k

thumb_up_off_alt9,9K

chat_bubble_outline764

repeat1,1K

shareShare

Rohan Paul

@rohanpaul_ai

6 months ago

Vision-Language Models struggle to ground complex instructions precisely at the pixel level. This paper introduces Ground-V, a new dataset automatically generated, to teach VLMs how to handle complex instructions for accurate pixel grounding. Methods 🔧: → Ground-V dataset

thumb_up_off_alt19

chat_bubble_outline0

repeat6

shareShare