Mu Cai (@mucai7) 's Twitter Profile
Mu Cai

@mucai7

Research Scientist @GoogleDeepMind, Multimodal Large Language Models ex: Ph.D. @WisconsinCS | @MSFTResearch

ID: 1126468933676986368

linkhttps://pages.cs.wisc.edu/~mucai/ calendar_today09-05-2019 12:48:17

197 Tweet

2,2K Takipçi

754 Takip Edilen

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Gemini 2.5 Flash just dropped. ⚡ As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 - making it ideal for tasks like building chat apps, extracting data and more. Try an early version in Google AI Studio → ai.dev

Mu Cai (@mucai7) 's Twitter Profile Photo

Totally agree. Models like #OpenAI 's #o3, #o4mini still can not figure out the basic geometry problems. If visual perception is wrong, then ``reasoning" part is meaningless. Huge room for improvement!

Mu Cai (@mucai7) 's Twitter Profile Photo

#OpenAI's #o3 #o4mini just again demonstrate the power of visual prompting in ViP-LLaVA(CVPR 2024)vip-llava.github.io In 2023, we proved that, drawing hints visually is more effective that elaborating in text, especially for object level understanding. Go for VisualThinking!

Xiang Li (@xiangli54505720) 's Twitter Profile Photo

Hi everyone! I hope you had a great time in Singapore🇸🇬. Though I could not be there in person, I'm excited to share our poster schedule at #ICLR2025. Feel free to stop by, check out our work, and bring any questions you have to Kanchana Ranasinghe.

Mu Cai (@mucai7) 's Twitter Profile Photo

I am excited to announce that I am not at #ICLR presenting Matryoshka Multimodal Models matryoshka-mm.github.io. 😀 But rather, I am online at Bay Area. Ping me if you have any questions or ideas w.r.t paper! Feel free to read the poster at Hall 3 + Hall 2B #86 this morning!

I am excited to announce that I am not at #ICLR presenting Matryoshka Multimodal Models matryoshka-mm.github.io. 😀 

But rather, I am online at Bay Area. Ping me if you have any questions or ideas w.r.t paper!

Feel free to read the poster at Hall 3 + Hall 2B #86 this morning!
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’re releasing an updated Gemini 2.5 Pro (I/O edition) to make it even better at coding. 🚀 You can build richer web apps, games, simulations and more - all with one prompt. In Google Gemini App, here's how it transformed images of nature into code to represent unique patterns 🌱

Mu Cai (@mucai7) 's Twitter Profile Photo

Thank you Yong Jae Lee! Without the support from you and our group members, it is impossible for me to have such works. I'll miss the days working in our group.

Thank you <a href="/yong_jae_lee/">Yong Jae Lee</a>! Without the support from you and our group members, it is impossible for me to have such works. I'll miss the days working in our group.
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across Google. 🧵

Pushmeet Kohli (@pushmeet) 's Twitter Profile Photo

Excited to announce AlphaEvolve A powerful AI coding agent developed by our team in Google DeepMind that is able to discover impactful new algorithms for important problems in Maths and Computing by combining the creativity of large language models with automated evaluators.

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Google's progress in AI since last year: - The worlds strongest models, on pareto frontier - Gemini app: has over 400M monthly active users - We now process 480T tokens a month, up 50x YoY - Over 7M developers have built with the Gemini API (4x) Much more to come still!

Feng Yao (@fengyao1909) 's Twitter Profile Photo

🔥 "Vibe coding" is everywhere—but is it really care-free? We introduce 𝐑𝐞𝐚𝐋, an RL framework that trains LLMs with automated program analysis feedback, enabling "vibe coding" to be not just fast—but 𝐯𝐮𝐥𝐧𝐞𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲-𝐟𝐫𝐞𝐞 & 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐫𝐞𝐚𝐝𝐲 🛡️

🔥 "Vibe coding" is everywhere—but is it really care-free?

We introduce 𝐑𝐞𝐚𝐋, an RL framework that trains LLMs with automated program analysis feedback, enabling "vibe coding" to be not just fast—but 𝐯𝐮𝐥𝐧𝐞𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲-𝐟𝐫𝐞𝐞 &amp; 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐫𝐞𝐚𝐝𝐲 🛡️
Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

As a video gaming company, Krafton AI has secretly been cooking something big with NVIDIA AI for a while! 🥳 We introduce Orak, the first comprehensive video gaming benchmark for LLMs! arxiv.org/abs/2506.03610

As a video gaming company, <a href="/Krafton_AI/">Krafton AI</a> has secretly been cooking something big with <a href="/NVIDIAAI/">NVIDIA AI</a> for a while!

🥳 We introduce Orak, the first comprehensive video gaming benchmark for LLMs!

arxiv.org/abs/2506.03610