Hugo (@mldhug) 's Twitter Profile
Hugo

@mldhug

PhD student in multimodal learning for audio understanding at @telecomparis

ID: 1162859120988438528

calendar_today17-08-2019 22:50:03

19 Tweet

45 Takipçi

403 Takip Edilen

Sagar Vaze (@sagar_vaze) 's Twitter Profile Photo

We'll present GeneCIS at #CVPR2023 (Highlight) TL;DR: While most image representations are *fixed*, we present a general way to train and evaluate models which can adapt to different *conditions* on the fly. Code: github.com/facebookresear… Project page: sgvaze.github.io/genecis/ 🧵

AK (@_akhaliq) 's Twitter Profile Photo

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration paper page: huggingface.co/papers/2306.09… Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

paper page: huggingface.co/papers/2306.09…

Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data
Wei-Ning Hsu (@mhnt1580) 's Twitter Profile Photo

Super excited to finally launch Voicebox🤩, the most versatile speech generative model ever💬👄 Demo page: voicebox.metademolab.com

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Who killed non-contrastive image-text pretraining? Alec Radford and Jong Wook Kim 💟 with the below Fig2 in CLIP. Who collected the 7 Dragonballs and asked Shenron to resurrect it? Yours truly, in this new paper of ours. Generative captioning is not only competitive, it seems better!

Who killed non-contrastive image-text pretraining? <a href="/AlecRad/">Alec Radford</a> and <a href="/_jongwook_kim/">Jong Wook Kim 💟</a> with the below Fig2 in CLIP.

Who collected the 7 Dragonballs and asked Shenron to resurrect it? Yours truly, in this new paper of ours.

Generative captioning is not only competitive, it seems better!
Jack (in SF) Langerman (@jacklangerman) 's Twitter Profile Photo

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Looks promising; I'll have to try and see if it stands upto some poking ;-) Love that they get around the need for multimodal training. ar5iv.org/abs/2306.16410 github.com/ContextualAI/l…

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

Looks promising; I'll have to try and see if it stands upto some poking ;-)

Love that they get around the need for multimodal training.

ar5iv.org/abs/2306.16410
github.com/ContextualAI/l…
AK (@_akhaliq) 's Twitter Profile Photo

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals paper page: huggingface.co/papers/2306.16… paper introduces DreamDiffusion, a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

paper page: huggingface.co/papers/2306.16…

paper introduces DreamDiffusion, a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate
Puyuan Peng (@puyuanpeng) 's Twitter Profile Photo

Why is Whisper so robust to background noise? Not because Whisper suppresses them, but because Whisper 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐬 them! Check out the amazing work by Yuan Gong Yuan Gong. They reveal this emergent capability of Whisper, and get SOTA *simultaneous* ASR + audio tagging

Why is Whisper so robust to background noise? Not because Whisper suppresses them, but because Whisper 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐬 them!

Check out the amazing work by Yuan Gong <a href="/YGongND/">Yuan Gong</a>. They reveal this emergent capability of Whisper, and get SOTA *simultaneous* ASR + audio tagging
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

It's really surprising how far one can go with *linear* predictors in the autoregressive setting. Interesting theory and experiments on TinyStories: a linear model (with 162M params :-) ) can generate totally coherent text with few grammatical mistakes. arxiv.org/abs/2309.06979

It's really surprising how far one can go with *linear* predictors in the autoregressive setting. 

Interesting theory and experiments on TinyStories: a linear model (with 162M params :-) ) can generate totally coherent text with few grammatical mistakes.

arxiv.org/abs/2309.06979
Salah Zaiem (@salah_zaiem) 's Twitter Profile Photo

Given a number of ASR models of different sizes, how can I allocate an audio sample to the smallest one that will be good enough ? Hugo worked on this question during his internship, and ended up with interesting conclusions you will find in our paper !

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment,'' Hugo Malard, Michel Olvera, St\'ephane Lathuiliere, Slim Essid, ift.tt/lf5BrIC

Michel Olvera (@michelolzam) 's Twitter Profile Photo

Great talk today by Haohe Liu at the ADASP group on Latent Diffusion Models (LDMs) as versatile audio decoder! Walked us through diffusion basics, AudioLDM for text-to-audio, audio quality enhancement, and neural codecs!

Great talk today by <a href="/LiuHaohe/">Haohe Liu</a> at the <a href="/tp_adasp/">ADASP</a> group on Latent Diffusion Models (LDMs) as versatile audio decoder! Walked us through diffusion basics, AudioLDM for text-to-audio, audio quality enhancement, and neural codecs!
Hugo (@mldhug) 's Twitter Profile Photo

If you want to learn more about audio-visual alignment and how to use it to give audio abilities to your VLM, stop by our NeurIPS Conference poster #3602 (East exhibit hall A-C) tomorrow at 11am!

Salah Zaiem (@salah_zaiem) 's Twitter Profile Photo

We are looking for audio and speech generation people, in Zurich, Paris or London to join our team at Google Deepmind. We build cutting-edge speech, music and audio (also audio-visual) generation capabilities. Reach out to Jason or me if interested. Retweets very appreciated !