
Vittorio Ferrari
@vittoferraricv
Director of Science at Synthesia.io
ID: 1275184138664976384
https://sites.google.com/view/vittoferrari 22-06-2020 21:50:10
60 Tweet
5,5K Takipçi
12 Takip Edilen

Introducing HAMMR: hierarchical multimodal agents that handle a broad range of VQA tasks within a single system (counting, spatial reasoning, OCR, visual pointing, external knowledge, and more). arxiv.org/abs/2404.05465 Lluís Castrejón @tejmensink Howard Zhou André Araujo Jasper Uijlings
