Manu Gaur
@gaur_manu
Used to do physics, now multiplying matrices @IIIT_Hyderabad Prev @amazon, @UTSResearch
ID: 588134988
https://manugaurdl.github.io/ 23-05-2012 06:14:57
2,2K Tweet
213 Takipçi
913 Takip Edilen
Rohan Choudhury I think the point is that we can use encoders like dino, siglip and that semantic representations arent as lossy as we had thought initially - cnn, vit doesn’t really matter
Chris Offner Alexandre Morgand That's a curve ball question but here's my intuition/hypothesis: 1. Locality of Task: Depth estimation driven by priors can be primarily thought of as a local task, i.e., given semantic context of things in the image you can predict the relative depth - hence why also linear