GrowAIlikeAChild (@growailikechild) 's Twitter Profile
GrowAIlikeAChild

@growailikechild

growing-ai-like-a-child.github.io

ID: 1888060445111504896

calendar_today08-02-2025 03:02:02

15 Tweet

42 Takipçi

7 Takip Edilen

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Vision-Language Models (VLMs) can describe the environment, but can they refer within it? Our findings reveal a critical gap: VLMs fall short of pragmatic optimality. We identify 3 key failures of pragmatic competence in referring expression generation with VLMs: (1) cannot

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

P.S., We are building GrowAIlikeAChild, an open-source community uniting researchers from computer science, cognitive science, psychology, linguistics, philosophy, and beyond. Instead of putting growing up and scaling up into opposite camps, let's build and evaluate human-like AI

Hokin Deng (@denghokin) 's Twitter Profile Photo

🔨🔧⚙️ Have Vision Language Models solved mechanical reasoning? If you are at ICLR 2026, please come and check out our poster "Probing Mechanical Reasoning in Large Vision Language Models" Bidirectional Human-AI Alignment #ICLR2025 ! 📷Room Garnet 216-214 🗓️April 28 My work with GrowAIlikeAChild

🔨🔧⚙️ Have Vision Language Models solved mechanical reasoning?  

If you are at <a href="/iclr_conf/">ICLR 2026</a>, please come and check out our poster "Probing Mechanical Reasoning in Large Vision Language Models" <a href="/bi_align/">Bidirectional Human-AI Alignment</a> 
#ICLR2025 ! 📷Room Garnet 216-214 🗓️April 28 

My work with <a href="/GrowAiLikeChild/">GrowAIlikeAChild</a>
Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICLR please checkout our poster‼️We evaluated 209 models and all of them are stochastic parrots 🦜 🙀Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, physics textbook, Pisa's tower, fall at the same time" (shortcuts).

#ICLR please checkout our poster‼️We evaluated 209 models and all of them are stochastic parrots 🦜

🙀Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, physics textbook, Pisa's tower, fall at the same time" (shortcuts).
Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICLR2025 #ICLR Small poster, big insights ⁉️ Vision Language Models Know Law of Conservation without Understanding More-or-Less 🙀🙀 Come to our poster at ICLR 2026 Bidirectional Human-AI Alignment 📷Room Garnet 216-214 📷 ❗️April 28 ‼️Work by GrowAIlikeAChild

#ICLR2025 #ICLR Small poster, big insights ⁉️ 

Vision Language Models Know Law of Conservation without Understanding More-or-Less 🙀🙀

Come to our poster at <a href="/iclr_conf/">ICLR 2026</a>  <a href="/bi_align/">Bidirectional Human-AI Alignment</a> 
📷Room Garnet 216-214 📷
❗️April 28
‼️Work by <a href="/GrowAiLikeChild/">GrowAIlikeAChild</a>
Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICLR Bidirectional Human-AI Alignment‼️🤔Intention understanding and perspective-taking are core theory-of-mind abilities that humans typically develop starting around age 3. However, in VLMs, these two abilities dissociate.🧐 📅 April 28, Garnet 216-214 openreview.net/forum?id=rmHnN… 👏GrowAIlikeAChild

#ICLR <a href="/bi_align/">Bidirectional Human-AI Alignment</a>‼️🤔Intention understanding and perspective-taking are core theory-of-mind abilities that humans typically develop starting around age 3. However, in VLMs, these two abilities dissociate.🧐 
📅 April 28, Garnet 216-214 openreview.net/forum?id=rmHnN… 
👏<a href="/GrowAiLikeChild/">GrowAIlikeAChild</a>
Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICLR Spurious Correlation & Shortcut Learning Workshop‼️introducing "Concept-Hacking" 🙀 We evaluated 209 models and found all of them are stochastic parrots🦜 🙀 Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, fall at the same time" (shortcuts). 😲

#ICLR <a href="/SCSLWorkshop/">Spurious Correlation & Shortcut Learning Workshop</a>‼️introducing "Concept-Hacking" 🙀 We evaluated 209 models and found all of them are stochastic parrots🦜

🙀 Models either believe "the bigger the ball, the quicker it falls" (illusions), or, "not matter how big, fall at the same time" (shortcuts). 😲
Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICLR 💐 GrowAIlikeAChild We have 4 posters tomorrow‼️ Evaluating Multi-modal Language Models Through Concept Hacking 🦜Spurious Correlation & Shortcut Learning Workshop openreview.net/forum?id=B2QXX… Vision Language Models See What You Want but not What You See 😍 🙈Bidirectional Human-AI Alignment openreview.net/forum?id=rmHnN… Probing Mechanical

#ICLR 💐 <a href="/GrowAiLikeChild/">GrowAIlikeAChild</a> We have 4 posters tomorrow‼️
Evaluating Multi-modal Language Models Through Concept Hacking 🦜<a href="/SCSLWorkshop/">Spurious Correlation & Shortcut Learning Workshop</a> 
openreview.net/forum?id=B2QXX…
Vision Language Models See What You Want but not What You See 😍 🙈<a href="/bi_align/">Bidirectional Human-AI Alignment</a>
openreview.net/forum?id=rmHnN…
Probing Mechanical
Zory Zhang (@zory_zhang) 's Twitter Profile Photo

👁️ 𝐂𝐚𝐧 𝐕𝐢𝐬𝐢𝐨𝐧 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐕𝐋𝐌𝐬) 𝐈𝐧𝐟𝐞𝐫 𝐇𝐮𝐦𝐚𝐧 𝐆𝐚𝐳𝐞 𝐃𝐢𝐫𝐞𝐜𝐭𝐢𝐨𝐧? Knowing where someone looks is key to a Theory of Mind. We test 111 VLMs and 65 humans to compare their inferences. Project page: grow-ai-like-a-child.github.io/gaze/ 🧵1/11

Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

Gaze has been on my mind for a long time. In real life, we don’t use mouse cursors—we use gaze, head turns, and gestures to refer nonverbally. We (GrowAIlikeAChild) ask: can VLMs interpret gaze like humans do? Spoiler: they mostly chase head direction, not eye gaze. Having

Hokin Deng (@denghokin) 's Twitter Profile Photo

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). We spent another year to get 230 MLLMs evaluated

#ICML #cognition #GrowAI We spent 2 years carefully curated every single experiment (i.e. object permanence, A-not-B task, visual cliff task) in this dataset (total: 1503 classic experiments spanning 12 core cognitive concepts). 

We spent another year to get 230 MLLMs evaluated
Hokin Deng (@denghokin) 's Twitter Profile Photo

🚀 Dive in 👇 🌐 Project: williamium3000.github.io/core-knowledge/ 📄 Paper: arxiv.org/abs/2410.10855 📝 OpenReview: openreview.net/forum?id=EIK6x… 📊 Dataset: huggingface.co/datasets/willi… 💻 Code: github.com/williamium3000… 😎 Team: growing-ai-like-a-child.github.io 💥 Huge thanks to my amazing collaborators

William Yijiang Li (@williamiumli) 's Twitter Profile Photo

🔥 Huge thanks to Yann LeCun and everyone for reposting our #ICML2025 work! 🚀 ✨12 core abilities, 📚1503 tasks, 🤖230 MLLMs, 🗨️11 prompts, 📊2503 data points. 🧠 We try to answer the question: 🔍 Do Multi-modal Large Language Models have grounded perception and reasoning?

Hokin Deng (@denghokin) 's Twitter Profile Photo

Thanks to SenseTime to comprehensively investigate our #CoreCognition framework and eval on GPT-5 OpenAI Extremely interestingly, GPT-5s have achieved significantly improvements on concrete operational stage (Alexander Wei Noam Brown), namely object permanence, intuitive

Thanks to <a href="/SenseTime_AI/">SenseTime</a> to comprehensively investigate our #CoreCognition framework and eval on GPT-5 <a href="/OpenAI/">OpenAI</a> 

Extremely interestingly, GPT-5s have achieved significantly improvements on concrete operational stage (<a href="/alexwei_/">Alexander Wei</a> <a href="/polynoamial/">Noam Brown</a>), namely object permanence, intuitive
Zhongang Cai (@caizhongang) 's Twitter Profile Photo

Hokin Deng SenseTime OpenAI Alexander Wei Noam Brown YuanLiuuuuuu Yubo Wang Brian Bo Li Ziwei Liu Thanks, Hokin! CoreCognition has been a major source of inspiration for us. What we find particularly fascinating is how perspective-taking seems largely uncorrelated with other multimodal capabilities. Congrats to your team too, and we’re excited to see what’s next. 😄

<a href="/DengHokin/">Hokin Deng</a> <a href="/SenseTime_AI/">SenseTime</a> <a href="/OpenAI/">OpenAI</a> <a href="/alexwei_/">Alexander Wei</a> <a href="/polynoamial/">Noam Brown</a> <a href="/a33668874586/">YuanLiuuuuuu</a> <a href="/Yubo_Wang1206/">Yubo Wang</a> <a href="/BoLi68567011/">Brian Bo Li</a> <a href="/liuziwei7/">Ziwei Liu</a> Thanks, Hokin! CoreCognition has been a major source of inspiration for us. What we find particularly fascinating is how perspective-taking seems largely uncorrelated with other multimodal capabilities. Congrats to your team too, and we’re excited to see what’s next. 😄
Lei Yang (@drowsyleilei) 's Twitter Profile Photo

Thanks Hokin Deng for the attention to our work! We’re inspired by many insights in #CoreCognition. Particularly, Fig.6 is a key motivation—showing perspective-taking as a unique ability with low correlation to others, even though CC isn’t specifically about spatial intelligence.

Hokin Deng (@denghokin) 's Twitter Profile Photo

#GrowAI #cognition #philoLLM I am very excited that many frontier LM scientists start to look into insights from human cognition and philosophy of mind. I talked to many philosophers recently and conveyed to them that frontier AI practitioners are not just engineers who chase