
Hugo
@mldhug
PhD student in multimodal learning for audio understanding at @telecomparis
ID: 1162859120988438528
17-08-2019 22:50:03
19 Tweet
45 Followers
403 Following

We'll present GeneCIS at #CVPR2023 (Highlight) TL;DR: While most image representations are *fixed*, we present a general way to train and evaluate models which can adapt to different *conditions* on the fly. Code: github.com/facebookresearโฆ Project page: sgvaze.github.io/genecis/ ๐งต

Thanks for tweeting, @AK!ย Weโre super excited about the future of text-only vision model selection! ๐ mars huang Jackson (Kuan-Chieh) Wang @cvpr @syeung10


Who killed non-contrastive image-text pretraining? Alec Radford and Jong Wook Kim ๐ with the below Fig2 in CLIP. Who collected the 7 Dragonballs and asked Shenron to resurrect it? Yours truly, in this new paper of ours. Generative captioning is not only competitive, it seems better!



Why is Whisper so robust to background noise? Not because Whisper suppresses them, but because Whisper ๐ฎ๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ฌ them! Check out the amazing work by Yuan Gong Yuan Gong. They reveal this emergent capability of Whisper, and get SOTA *simultaneous* ASR + audio tagging






If you want to learn more about audio-visual alignment and how to use it to give audio abilities to your VLM, stop by our NeurIPS Conference poster #3602 (East exhibit hall A-C) tomorrow at 11am!

