Xinhao Mei (@xinhao_mei) Twitter Tweets • TwiCopy

Xinhao Mei

@xinhao_mei

3 years ago

New audio-text dataset🧐

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Can't wait to share our new Text-to-Audio model, AudioLDM. 😆 This video shows the generation result with a simple text prompt: "A music made by xxx". More demos coming soon!😉 The paper will be available next Monday on arXiv! 😊 Our model will be open-sourced soon!😎

thumb_up_off_alt579

chat_bubble_outline24

repeat95

shareShare

AK

@_akhaliq

3 years ago

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions abs: arxiv.org/abs/2303.17395

thumb_up_off_alt141

chat_bubble_outline1

repeat28

shareShare

Xinhao Mei

@xinhao_mei

3 years ago

🔊 So exited to share our new work, WavCaps: a large-scale weakly-labelled audio captioning dataset. We utilize #ChatGPT to filter & transform noisy data into captions. See remarkable improvements over previous SOTA on multiple tasks! ☺️ Code: github.com/XinhaoMei/WavC…

thumb_up_off_alt54

chat_bubble_outline1

repeat13

shareShare

Haohe Liu

@liuhaohe

3 years ago

Excited to announce that our paper, "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models," has been accepted at #ICML2023. Many thanks to the reviewers for their invaluable feedback. It's nice to collaborate with Zehua Chen and other co-authors. Also, special

thumb_up_off_alt125

chat_bubble_outline4

repeat19

shareShare

AK

@_akhaliq

3 years ago

Universal Source Separation with Weakly Labelled Data abs: arxiv.org/abs/2305.07447 paper page: huggingface.co/papers/2305.07… github: github.com/bytedance/uss

thumb_up_off_alt164

chat_bubble_outline0

repeat36

shareShare

Xinhao Mei

@xinhao_mei

3 years ago

Stay tuned for AudioLDM 2!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AK

@_akhaliq

3 years ago

AudioLDM 2: text to audio/music/speech generation Code/models/paper coming soon

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat287

shareShare

Haohe Liu

@liuhaohe

3 years ago

Also available on Github github.com/haoheliu/Audio…

thumb_up_off_alt82

chat_bubble_outline2

repeat17

shareShare

Haohe Liu

@liuhaohe

3 years ago

AudioLDM 2 paper is now available on Arxiv: arxiv.org/pdf/2308.05734… AudioLDM 2 project page (Demo, Code, Discord): audioldm.github.io/audioldm2/

thumb_up_off_alt73

chat_bubble_outline1

repeat16

shareShare

Haohe Liu

@liuhaohe

3 years ago

48kHz AudioLDM now open-sourced on GitHub 🔊Text-to-HiFiAudio Generation Much better than the previous 16kHz. The speed-optimized version will be available on HF and Diffusers soon. github.com/haoheliu/Audio…

thumb_up_off_alt300

chat_bubble_outline4

repeat71

shareShare

Haohe Liu

@liuhaohe

3 years ago

A brilliant AudioLDM 2 optimization guide authored by Sanchit Gandhi: huggingface.co/blog/audioldm2

thumb_up_off_alt45

chat_bubble_outline2

repeat7

shareShare

Haohe Liu

@liuhaohe

3 years ago

🔊Introducing AudioSR: a plug-and-play & one-for-all solution to upsample your audio to stunning 48kHz quality! 👉Significant improvement verified on MusicGen (32kHz), AudioLDM (16kHz), and FastSpeech2 (22kHz)! Demo, code, and paper: audioldm.github.io/audiosr #AudioSR

thumb_up_off_alt196

chat_bubble_outline7

repeat42

shareShare

AK

@_akhaliq

3 years ago

FoleyGen: Visually-Guided Audio Generation paper page: huggingface.co/papers/2309.10… Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues

thumb_up_off_alt112

chat_bubble_outline0

repeat34

shareShare

Hung-yi Lee (李宏毅)

@hungyilee2

2 years ago

Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com

thumb_up_off_alt115

chat_bubble_outline1

repeat20

shareShare

Haohe Liu

@liuhaohe

2 years ago

New Challenge on IEEE ICME 2024: Semi-supervised Acoustic Scene Classification under Domain Shift! The final submission deadline is Mar 22. Innovate and compete to win up to $600! 🏆 Baseline & more info are available: ascchallenge.xshengyun.com #IEEEICME2024 #MachineLearning

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Thomas Pellegrini

@topel290118

2 years ago

Machine listening people, please consider participating to the audio captioning task of DCASE. A new baseline system is provided: CNext-trans, 28M params, 29.6% SPIDEr-FL score on Clotho-eval dcase.community/challenge2024/… github.com/Labbeti/dcase2… #DCASE #audiocaptioning

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare