GrowAIlikeAChild (@growailikechild) Twitter Tweets • TwiCopy

Martin Ziqiao Ma

9 months ago

Vision-Language Models (VLMs) can describe the environment, but can they refer within it? Our findings reveal a critical gap: VLMs fall short of pragmatic optimality. We identify 3 key failures of pragmatic competence in referring expression generation with VLMs: (1) cannot

thumb_up_off_alt114

chat_bubble_outline2

repeat27

shareShare

Martin Ziqiao Ma

@ziqiao_ma

9 months ago

P.S., We are building GrowAIlikeAChild, an open-source community uniting researchers from computer science, cognitive science, psychology, linguistics, philosophy, and beyond. Instead of putting growing up and scaling up into opposite camps, let's build and evaluate human-like AI

thumb_up_off_alt9

chat_bubble_outline0

repeat4

shareShare

Hokin Deng

@denghokin

9 months ago

🔨🔧⚙️ Have Vision Language Models solved mechanical reasoning? If you are at ICLR 2026, please come and check out our poster "Probing Mechanical Reasoning in Large Vision Language Models" Bidirectional Human-AI Alignment #ICLR2025 ! 📷Room Garnet 216-214 🗓️April 28 My work with GrowAIlikeAChild

🔨🔧⚙️ Have Vision Language Models solved mechanical reasoning?

If you are at <a href="/iclr_conf/">ICLR 2026</a>, please come and check out our poster "Probing Mechanical Reasoning in Large Vision Language Models" <a href="/bi_align/">Bidirectional Human-AI Alignment</a>
#ICLR2025 ! 📷Room Garnet 216-214 🗓️April 28

My work with <a href="/GrowAiLikeChild/">GrowAIlikeAChild</a>

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Computer Science and Engineering at Michigan

@umichcse

9 months ago

ICLR 2026 is this week! Check out new research by Rada Mihalcea Martin Ziqiao Ma Haizhong Zheng Atul Prakash Zhijing Jin Honglak Lee and more! cse.engin.umich.edu/stories/eleven…

thumb_up_off_alt7

chat_bubble_outline0

repeat4

shareShare

Hokin Deng

@denghokin

9 months ago

#ICLR please checkout our poster‼️We evaluated 209 models and all of them are stochastic parrots 🦜 🙀Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, physics textbook, Pisa's tower, fall at the same time" (shortcuts).

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare

Hokin Deng

@denghokin

9 months ago

#ICLR2025 #ICLR Small poster, big insights ⁉️ Vision Language Models Know Law of Conservation without Understanding More-or-Less 🙀🙀 Come to our poster at ICLR 2026 Bidirectional Human-AI Alignment 📷Room Garnet 216-214 📷 ❗️April 28 ‼️Work by GrowAIlikeAChild

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Hokin Deng

@denghokin

9 months ago

#ICLR Bidirectional Human-AI Alignment‼️🤔Intention understanding and perspective-taking are core theory-of-mind abilities that humans typically develop starting around age 3. However, in VLMs, these two abilities dissociate.🧐 📅 April 28, Garnet 216-214 openreview.net/forum?id=rmHnN… 👏GrowAIlikeAChild

#ICLR <a href="/bi_align/">Bidirectional Human-AI Alignment</a>‼️🤔Intention understanding and perspective-taking are core theory-of-mind abilities that humans typically develop starting around age 3. However, in VLMs, these two abilities dissociate.🧐
📅 April 28, Garnet 216-214 openreview.net/forum?id=rmHnN…
👏<a href="/GrowAiLikeChild/">GrowAIlikeAChild</a>

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Hokin Deng

@denghokin

9 months ago

#ICLR Spurious Correlation & Shortcut Learning Workshop‼️introducing "Concept-Hacking" 🙀 We evaluated 209 models and found all of them are stochastic parrots🦜 🙀 Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, fall at the same time" (shortcuts). 😲

#ICLR <a href="/SCSLWorkshop/">Spurious Correlation & Shortcut Learning Workshop</a>‼️introducing "Concept-Hacking" 🙀 We evaluated 209 models and found all of them are stochastic parrots🦜

🙀 Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, fall at the same time" (shortcuts). 😲

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Hokin Deng

@denghokin

9 months ago

#ICLR 💐 GrowAIlikeAChild We have 4 posters tomorrow‼️ Evaluating Multi-modal Language Models Through Concept Hacking 🦜Spurious Correlation & Shortcut Learning Workshop openreview.net/forum?id=B2QXX… Vision Language Models See What You Want but not What You See 😍 🙈Bidirectional Human-AI Alignment openreview.net/forum?id=rmHnN… Probing Mechanical

#ICLR 💐 <a href="/GrowAiLikeChild/">GrowAIlikeAChild</a> We have 4 posters tomorrow‼️
Evaluating Multi-modal Language Models Through Concept Hacking 🦜<a href="/SCSLWorkshop/">Spurious Correlation & Shortcut Learning Workshop</a>
openreview.net/forum?id=B2QXX…
Vision Language Models See What You Want but not What You See 😍 🙈<a href="/bi_align/">Bidirectional Human-AI Alignment</a>
openreview.net/forum?id=rmHnN…
Probing Mechanical

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Hokin Deng

@denghokin

9 months ago

Finally finish making our website GrowAIlikeAChild growing-ai-like-a-child.github.io 🙌 with William Yijiang Li Dezhi Luo Martin Ziqiao Ma Zory Zhang Pinyuan Feng (Tony) Pooyan Rahmanzadehgervi and many others 🚶🚶‍♂️🚶‍♀️

thumb_up_off_alt23

chat_bubble_outline2

repeat9

shareShare

Zory Zhang

@zory_zhang

8 months ago

👁️ 𝐂𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐕𝐋𝐌𝐬) 𝐈𝐧𝐟𝐞𝐫 𝐇𝐮𝐦𝐚𝐧 𝐆𝐚𝐳𝐞 𝐃𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧? Knowing where someone looks is key to a Theory of Mind. We test 111 VLMs and 65 humans to compare their inferences. Project page: grow-ai-like-a-child.github.io/gaze/ 🧵1/11

thumb_up_off_alt10

chat_bubble_outline1

repeat6

shareShare

Martin Ziqiao Ma

@ziqiao_ma

8 months ago

Gaze has been on my mind for a long time. In real life, we don’t use mouse cursors—we use gaze, head turns, and gestures to refer nonverbally. We (GrowAIlikeAChild) ask: can VLMs interpret gaze like humans do? Spoiler: they mostly chase head direction, not eye gaze. Having

thumb_up_off_alt13

chat_bubble_outline0

repeat7

shareShare

Hokin Deng

@denghokin

7 months ago

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). We spent another year to get 230 MLLMs evaluated

thumb_up_off_alt524

chat_bubble_outline11

repeat72

shareShare

Hokin Deng

@denghokin

7 months ago

🚀 Dive in 👇 🌐 Project: williamium3000.github.io/core-knowledge/ 📄 Paper: arxiv.org/abs/2410.10855 📝 OpenReview: openreview.net/forum?id=EIK6x… 📊 Dataset: huggingface.co/datasets/willi… 💻 Code: github.com/williamium3000… 😎 Team: growing-ai-like-a-child.github.io 💥 Huge thanks to my amazing collaborators

thumb_up_off_alt44

chat_bubble_outline2

repeat4

shareShare

William Yijiang Li

@williamiumli

7 months ago

🔥 Huge thanks to Yann LeCun and everyone for reposting our #ICML2025 work! 🚀 ✨12 core abilities, 📚1503 tasks, 🤖230 MLLMs, 🗨️11 prompts, 📊2503 data points. 🧠 We try to answer the question: 🔍 Do Multi-modal Large Language Models have grounded perception and reasoning?

thumb_up_off_alt23

chat_bubble_outline0

repeat11

shareShare

Hokin Deng

@denghokin

5 months ago

Thanks to SenseTime to comprehensively investigate our #CoreCognition framework and eval on GPT-5 OpenAI Extremely interestingly, GPT-5s have achieved significantly improvements on concrete operational stage (Alexander Wei Noam Brown), namely object permanence, intuitive

Thanks to <a href="/SenseTime_AI/">SenseTime</a> to comprehensively investigate our #CoreCognition framework and eval on GPT-5 <a href="/OpenAI/">OpenAI</a>

Extremely interestingly, GPT-5s have achieved significantly improvements on concrete operational stage (<a href="/alexwei_/">Alexander Wei</a> <a href="/polynoamial/">Noam Brown</a>), namely object permanence, intuitive

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Zhongang Cai

@caizhongang

5 months ago

Hokin Deng SenseTime OpenAI Alexander Wei Noam Brown YuanLiuuuuuu Yubo Wang Brian Bo Li Ziwei Liu Thanks, Hokin! CoreCognition has been a major source of inspiration for us. What we find particularly fascinating is how perspective-taking seems largely uncorrelated with other multimodal capabilities. Congrats to your team too, and we’re excited to see what’s next. 😄

<a href="/DengHokin/">Hokin Deng</a> <a href="/SenseTime_AI/">SenseTime</a> <a href="/OpenAI/">OpenAI</a> <a href="/alexwei_/">Alexander Wei</a> <a href="/polynoamial/">Noam Brown</a> <a href="/a33668874586/">YuanLiuuuuuu</a> <a href="/Yubo_Wang1206/">Yubo Wang</a> <a href="/BoLi68567011/">Brian Bo Li</a> <a href="/liuziwei7/">Ziwei Liu</a> Thanks, Hokin! CoreCognition has been a major source of inspiration for us. What we find particularly fascinating is how perspective-taking seems largely uncorrelated with other multimodal capabilities. Congrats to your team too, and we’re excited to see what’s next. 😄

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Lei Yang

@drowsyleilei

5 months ago

Thanks Hokin Deng for the attention to our work! We’re inspired by many insights in #CoreCognition. Particularly, Fig.6 is a key motivation—showing perspective-taking as a unique ability with low correlation to others, even though CC isn’t specifically about spatial intelligence.

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Hokin Deng

@denghokin

5 months ago

#GrowAI #cognition #philoLLM I am very excited that many frontier LM scientists start to look into insights from human cognition and philosophy of mind. I talked to many philosophers recently and conveyed to them that frontier AI practitioners are not just engineers who chase

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare