Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile
Kaiwen Zhou

@kaiwenzhou9

A CSE PhD student in @ucsc, working on multimodal AI Agent and responsible AI. Previous: @Samsung_RA, @hri_usa. Looking for summer intern 2025.

ID: 1506422274307416069

calendar_today23-03-2022 00:07:29

80 Tweet

184 Takipçi

197 Takip Edilen

Qianqi "Jackie" Yan (@qianqi_yan) 's Twitter Profile Photo

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨ Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?

New Paper Alert: Multimodal Inconsistency Reasoning (MMIR)! ✨

Ever visited a webpage where the text says “IKEA desk” yet images and descriptions elsewhere show a totally different brand? Or read a slide that shows “50% growth” in the text but the accompanying chart looks flat?
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

𝐁𝐞𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈 𝐢𝐬 𝐧𝐨𝐭 𝐚𝐬 𝐡𝐚𝐫𝐝 𝐚𝐬 𝐲𝐨𝐮 𝐭𝐡𝐢𝐧𝐤. If you don't believe you can compete, you've already lost. Winning starts with mindset. 🚀Introducing 𝑨𝒈𝒆𝒏𝒕 𝑺2, 𝐭𝐡𝐞 𝐰𝐨𝐫𝐥𝐝'𝐬 𝐛𝐞𝐬𝐭 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫-𝐮𝐬𝐞 𝐚𝐠𝐞𝐧𝐭, and the second

𝐁𝐞𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈 𝐢𝐬 𝐧𝐨𝐭 𝐚𝐬 𝐡𝐚𝐫𝐝 𝐚𝐬 𝐲𝐨𝐮 𝐭𝐡𝐢𝐧𝐤. If you don't believe you can compete, you've already lost. Winning starts with mindset.

🚀Introducing 𝑨𝒈𝒆𝒏𝒕 𝑺2, 𝐭𝐡𝐞 𝐰𝐨𝐫𝐥𝐝'𝐬 𝐛𝐞𝐬𝐭 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫-𝐮𝐬𝐞 𝐚𝐠𝐞𝐧𝐭, and the second
Kaiwen Zhou (@kaiwenzhou9) 's Twitter Profile Photo

Could not attend #ICLR2025 🥲. But Chengzhi Liu will present our Multimodal Situational Safety paper on April 25, 3:00-5:30 pm in Hall 3 + Hall 2B #538. Welcome to check it out!

Could not attend #ICLR2025 🥲. But <a href="/liuchen02938149/">Chengzhi Liu</a> will present our Multimodal Situational Safety paper on April 25, 3:00-5:30 pm in Hall 3 + Hall 2B #538. Welcome to check it out!
Yue Fan (@yfan_ucsc) 's Twitter Profile Photo

Before o3 impressed everyone with 🔥visual reasoning🔥, we already had faith in and were exploring models that can think with images. 🚀 Here’s our shot, GRIT: Grounded Reasoning with Images & Texts that trains MLLMs to think while performing visual grounding. It is done via RL

Before o3 impressed everyone with 🔥visual reasoning🔥, we already had faith in and were exploring models that can think with images. 🚀

Here’s our shot, GRIT: Grounded Reasoning with Images &amp; Texts that trains MLLMs to think while performing visual grounding. It is done via RL
Chengzhi Liu (@liuchen02938149) 's Twitter Profile Photo

🧠 More Thinking, Less Seeing? 👀 Exploring the Balance Between Reasoning and Hallucination in Multimodal Reasoning Models! Currently many multimodal reasoning models while striving for enhanced reasoning capabilities often neglect the issue of visual hallucinations. While

🧠  More Thinking, Less Seeing? 👀 Exploring the Balance Between Reasoning and Hallucination in Multimodal Reasoning Models! 

Currently many multimodal reasoning models while striving for enhanced reasoning capabilities often neglect the issue of visual hallucinations.  While
Jing Gu (@jinggu4ai) 's Twitter Profile Photo

🚨 PhyWorldBench, New Paper Alert! 🚨 Video‑generation models are jaw‑dropping—they conjure gorgeous scenes in seconds. But can they truly simulate the real world, respecting (or intentionally bending) the laws of physics? Introducing PhyWorldBench, the large‑scale benchmark I

🚨 PhyWorldBench, New Paper Alert! 🚨

Video‑generation models are jaw‑dropping—they conjure gorgeous scenes in seconds. But can they truly simulate the real world, respecting (or intentionally bending) the laws of physics?
Introducing PhyWorldBench, the large‑scale benchmark I