🚀 Check out our new benchmark GenExam! It formulates text-to-image generation as graph-drawing exams, with 10 disciplines and 1,000 questions.
Top model GPT-4o scores just 12.1%; open-source models near 0%.
Paper: arxiv.org/abs/2509.14232
Github: github.com/OpenGVLab/GenE…