Chengzu Li (@li_chengzu) 's Twitter Profile
Chengzu Li

@li_chengzu

PhD student in NLP @CambridgeLTL @JesusCollegeCam

ID: 1571803139069673473

linkhttps://chengzu-li.github.io/ calendar_today19-09-2022 10:07:34

58 Tweet

327 Takipçi

162 Takip Edilen

CambridgeLTL (@cambridgeltl) 's Twitter Profile Photo

Extremely happy to share that our PhD student Tiancheng Hu received the Apple Scholars in AI/ML PhD Fellowship! 🎉 The fellowship will support his research on LLM-based simulation and LLM personalisation. Congratulations again, Tiancheng Hu! 🥳 machinelearning.apple.com/updates/apple-…

Bowen Wang (@bowenwangnlp) 's Twitter Profile Photo

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free

Chengzu Li (@li_chengzu) 's Twitter Profile Photo

Happy to share that MVoT got accepted to ICML 2025 ICML Conference 🎉🎉#ICML If you are interested, do check out our paper and here are some other materials: 📰Report on IEEE Spectrum: spectrum.ieee.org/visual-reasoni… 🎤TWIML Podcast with Sam: twimlai.com/podcast/twimla…

Emile van Krieken (@emilevankrieken) 's Twitter Profile Photo

We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning 🚀 Read more 👇

Benjamin Minixhofer (@bminixhofer) 's Twitter Profile Photo

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper. This enables creating byte-level models at a fraction of the cost of what was needed previously. As a proof-of-concept, we created byte-level Gemma2 and Llama3 models. 🧵

We achieved the first instance of successful subword-to-byte distillation in our (just updated) paper.

This enables creating byte-level models at a fraction of the cost of what was needed previously.

As a proof-of-concept, we created byte-level Gemma2 and Llama3 models.

🧵
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement

Jiaang Li (@jiaangli) 's Twitter Profile Photo

🚀New Preprint Alert 🚀 Can Multimodal Retrieval Enhance Cultural Awareness in Vision-Language Models? Excited to introduce RAVENEA, a new benchmark aimed at evaluating cultural understanding in VLMs through RAG.

🚀New Preprint Alert 🚀
Can Multimodal Retrieval Enhance Cultural Awareness in Vision-Language Models?

Excited to introduce RAVENEA, a new benchmark aimed at evaluating cultural understanding in VLMs through RAG.
Yinya Huang ✈️ ICLR (@yinyahuang) 's Twitter Profile Photo

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀 🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks

🤖⚛️Can AI truly see Physics? Test your model with the newly released SeePhys Benchmark! 🚀

🖼️Covering 2,000 vision-text multimodal physics problems spanning from middle school to doctoral qualification exams, the SeePhys benchmark systematically evaluates LLMs/MLLMs on tasks
Caiqi Zhang (@caiqizh) 's Twitter Profile Photo

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation.

🤩No sampling. No slow post-hoc methods. Not limited to short-form QA!

‼️Just output confidence in a single decoding pass.

✅Better calibration!
🚀 20× faster runtime.

arXiv:2505.23912
👇
Han Zhou (@hanzhou032) 's Twitter Profile Photo

Automating Multi-Agent Design: 🧩Multi-agent systems aren’t just about throwing more LLM agents together. 🛠️They require mastering the subtle art of prompting and agent orchestration. Introducing MASS🚀- Our new agent optimization framework for better prompts and topologies!

Automating Multi-Agent Design:

🧩Multi-agent systems aren’t just about throwing more LLM agents together.

🛠️They require mastering the subtle art of prompting and agent orchestration.

Introducing MASS🚀- Our new agent optimization framework for better prompts and topologies!
Yifu Qiu (@yifuqiu98) 's Twitter Profile Photo

🔁 What if you could bootstrap a world model (state1 × action → state2) using a much easier-to-train dynamics model (state1 × state2 → action) in a generalist VLM? 💡 We show how a dynamics model can generate synthetic trajectories & serve for inference-time verification 🧵👇

Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile Photo

🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly

🤯What we know about RL for reasoning might not hold outside math and code?

We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly
Zhaochen Su (@suzhaochen0110) 's Twitter Profile Photo

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️
Our work offers a roadmap for more powerful & aligned AI. 🚀
📜 Paper: arxiv.org/pdf/2506.23918
⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
Tiancheng Hu (@tiancheng_hu) 's Twitter Profile Photo

Working on LLM social simulation and need data? Excited to announce our iNews paper is accepted to #ACL2025! 🥳 It's a large-scale dataset for predicting individualized affective responses to real-world, multimodal news. arxiv.org/abs/2503.03335 🤗 Data: huggingface.co/datasets/piteh…

Micah Goldblum (@micahgoldblum) 's Twitter Profile Photo

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜.  Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n