Alex Zhang (@a1zhang) 's Twitter Profile
Alex Zhang

@a1zhang

incoming phd student @MIT_CSAIL, @vant_ai, @princeton ‘24 | 🫵🏻 go participate in the @GPU_MODE kernel competition!!!

ID: 4593727300

linkhttp://alexzhang13.github.io/blog calendar_today24-12-2015 22:30:58

168 Tweet

11,11K Takipçi

415 Takip Edilen

Alex Zhang (@a1zhang) 's Twitter Profile Photo

So how well do the best VLMs (e.g. Gemini 2.5 Pro, GPT-4o, Claude 3.7) perform on VideoGameBench? 🥁 Really bad! Most models can’t progress at all in any games on VideoGameBench, which span a wide range of genres like platformers, FPS, RTS, RPGs, and more!

So how well do the best VLMs (e.g. Gemini 2.5 Pro, GPT-4o, Claude 3.7) perform on VideoGameBench? 🥁

Really bad! Most models can’t progress at all in any games on VideoGameBench, which span a wide range of genres like platformers, FPS, RTS, RPGs, and more!