Omar Alama عمر الأعمى
@omaralama
ECE Vision and Robot Perception PhD @ Carnegie Mellon University
ID: 700919826
https://www.linkedin.com/in/omar-alama-651442169/ 17-07-2012 11:54:07
153 Tweet
179 Followers
345 Following
SIGLIP wins over CLIP even in dense tasks like zero shot open-vocab semantic segmentation on Replica . Using the RayFronts encoder (NA attention + RADIO Pavlo Molchanov + SIGLIP Lucas Beyer (bl16)) projection to the CLS token gives you SoTA performance. No more SAM+CROP+CLIP business.
I was waiting for the AI to make a mistake the whole time. Was shocked by the quality. It was even simplifying new concepts introduced in our paper with analogies. Really impressive tool NotebookLM Listen to the full podcast here youtu.be/_tIVlw1Wrh4?si…