Our new Interspeech paper extracts a target conversation recording in extremely noisy scenarios. Our deep learning method automatically identifies who is speaking in the conversation and extracts their voices. Useful for interviews, vlogs, and AI agents in real-world.
Paper:
``Target Speaker ASR with Whisper,'' Alexander Polok, Dominik Klement, Matthew Wiesner, Sanjeev Khudanpur, Jan \v{C}ernock\'y, Luk\'a\v{s} Burget, ift.tt/41sdNo2
I'm excited to introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a new codec model that can disentangle arbitrary audio sources into distinct latent codes for speech, music, and SFX.
Check our paper below 👇
Paper: arxiv.org/abs/2409.11228
Excited to share our new preprint: "Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training"! Our method detects anomalies and explain it using text, without additional training. Read more here: arxiv.org/abs/2410.22056
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention is accepted in #CVPR2025 with the scores of 544; it is a pure linear-attention diffusion model; 1.8× faster than DiT with FlashAttention-2 at a 2048 resolution; code & paper: github.com/hustvl/DiG
I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century".
The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably
We knew very little about how LLMs actually work...until now.
Anthropic just dropped the most insane research paper, detailing some of the ways AI "thinks."
And it's completely different than we thought.
Here are their wild findings: 🧵
Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs…
TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources
I interviewed for an ML research internship at Meta (FAIR) a few years back. Don’t remember every detail now, but a few questions stuck with me.
Questions are below.
I can't stress enough how useful this trick has been for me in all these years
It reduces GPU memory by N equal the number of losses, at literally no cost (same speed, exactly same results down to the last decimal digit)
For example ... [1/2]