Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile
Kaiwen Zhou

@kaiwenzhou9

A CSE PhD student in @ucsc, working on multimodal AI Agent and responsible AI. Previous: @Samsung_RA, @hri_usa. Looking for summer intern 2025.

ID: 1506422274307416069

calendar_today23-03-2022 00:07:29

80 Tweet

184 Followers

197 Following

Qianqi "Jackie" Yan (@qianqi_yan) 's Twitter Profile Photo

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! โœจ Ever visited a webpage where the text says โ€œIKEA deskโ€ yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows โ€œ50% growthโ€ in the text but the accompanying chart looks flat?

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! โœจ

Ever visited a webpage where the text says โ€œIKEA deskโ€ yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows โ€œ50% growthโ€ in the text but the accompanying chart looks flat?
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

๐๐ž๐š๐ญ๐ข๐ง๐  ๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐ข๐ฌ ๐ง๐จ๐ญ ๐š๐ฌ ๐ก๐š๐ซ๐ ๐š๐ฌ ๐ฒ๐จ๐ฎ ๐ญ๐ก๐ข๐ง๐ค. If you don't believe you can compete, you've already lost. Winning starts with mindset. ๐Ÿš€Introducing ๐‘จ๐’ˆ๐’†๐’๐’• ๐‘บ2, ๐ญ๐ก๐ž ๐ฐ๐จ๐ซ๐ฅ๐'๐ฌ ๐›๐ž๐ฌ๐ญ ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ-๐ฎ๐ฌ๐ž ๐š๐ ๐ž๐ง๐ญ, and the second

๐๐ž๐š๐ญ๐ข๐ง๐  ๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐ข๐ฌ ๐ง๐จ๐ญ ๐š๐ฌ ๐ก๐š๐ซ๐ ๐š๐ฌ ๐ฒ๐จ๐ฎ ๐ญ๐ก๐ข๐ง๐ค. If you don't believe you can compete, you've already lost. Winning starts with mindset.

๐Ÿš€Introducing ๐‘จ๐’ˆ๐’†๐’๐’• ๐‘บ2, ๐ญ๐ก๐ž ๐ฐ๐จ๐ซ๐ฅ๐'๐ฌ ๐›๐ž๐ฌ๐ญ ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ-๐ฎ๐ฌ๐ž ๐š๐ ๐ž๐ง๐ญ, and the second
Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile Photo

Could not attend #ICLR2025 ๐Ÿฅฒ. But Chengzhi Liu will present our Multimodal Situational Safety paper on April 25, 3:00-5:30 pm in Hall 3 + Hall 2B #538. Welcome to check it out!

Could not attend #ICLR2025 ๐Ÿฅฒ. But <a href="/liuchen02938149/">Chengzhi Liu</a> will present our Multimodal Situational Safety paper on April 25, 3:00-5:30 pm in Hall 3 + Hall 2B #538. Welcome to check it out!
Yue Fan (@yfan_ucsc) 's Twitter Profile Photo

Before o3 impressed everyone with ๐Ÿ”ฅvisual reasoning๐Ÿ”ฅ, we already had faith in and were exploring models that can think with images. ๐Ÿš€ Hereโ€™s our shot, GRIT: Grounded Reasoning with Images & Texts that trains MLLMs to think while performing visual grounding. It is done via RL

Before o3 impressed everyone with ๐Ÿ”ฅvisual reasoning๐Ÿ”ฅ, we already had faith in and were exploring models that can think with images. ๐Ÿš€

Hereโ€™s our shot, GRIT: Grounded Reasoning with Images &amp; Texts that trains MLLMs to think while performing visual grounding. It is done via RL
Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile Photo

Still remember people commented on the close DDLS between ARR May and Neurips. Just check the ARR Oct ddl is Oct 6th๐Ÿค”

Chengzhi Liu (@liuchen02938149) 's Twitter Profile Photo

๐Ÿง  More Thinking, Less Seeing? ๐Ÿ‘€ Exploring the Balance Between Reasoning and Hallucination in Multimodal Reasoning Models! Currently many multimodal reasoning models while striving for enhanced reasoning capabilities often neglect the issue of visual hallucinations. While

๐Ÿง   More Thinking, Less Seeing? ๐Ÿ‘€ Exploring the Balance Between Reasoning and Hallucination in Multimodal Reasoning Models! 

Currently many multimodal reasoning models while striving for enhanced reasoning capabilities often neglect the issue of visual hallucinations.  While
Jing Gu (@jinggu4ai) 's Twitter Profile Photo

๐Ÿšจ PhyWorldBench, New Paper Alert! ๐Ÿšจ Videoโ€‘generation models are jawโ€‘droppingโ€”they conjure gorgeous scenes in seconds. But can they truly simulate the real world, respecting (or intentionally bending) the laws of physics? Introducing PhyWorldBench, the largeโ€‘scale benchmark I

๐Ÿšจ PhyWorldBench, New Paper Alert! ๐Ÿšจ

Videoโ€‘generation models are jawโ€‘droppingโ€”they conjure gorgeous scenes in seconds. But can they truly simulate the real world, respecting (or intentionally bending) the laws of physics?
Introducing PhyWorldBench, the largeโ€‘scale benchmark I