Simon Tran (@tnpthanhh) 's Twitter Profile
Simon Tran

@tnpthanhh

ID: 1822842871214751744

calendar_today12-08-2024 03:50:08

16 Tweet

1 Followers

98 Following

Shyam Gollakota (@shyamgollakota) 's Twitter Profile Photo

Our new Interspeech paper extracts a target conversation recording in extremely noisy scenarios. Our deep learning method automatically identifies who is speaking in the conversation and extracts their voices. Useful for interviews, vlogs, and AI agents in real-world. Paper:

arXiv Sound (@arxivsound) 's Twitter Profile Photo

``Target Speaker ASR with Whisper,'' Alexander Polok, Dominik Klement, Matthew Wiesner, Sanjeev Khudanpur, Jan \v{C}ernock\'y, Luk\'a\v{s} Burget, ift.tt/41sdNo2

Xubo Liu @ ICLR 2025 🇸🇬 (@liuxub) 's Twitter Profile Photo

I'm excited to introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a new codec model that can disentangle arbitrary audio sources into distinct latent codes for speech, music, and SFX. Check our paper below 👇 Paper: arxiv.org/abs/2409.11228

I'm excited to introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a new codec model that can disentangle arbitrary audio sources into distinct latent codes for speech, music, and SFX. 

Check our paper below 👇  
Paper: arxiv.org/abs/2409.11228
Yohei Kawaguchi (@yohekawag) 's Twitter Profile Photo

Excited to share our new preprint: "Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training"! Our method detects anomalies and explain it using text, without additional training. Read more here: arxiv.org/abs/2410.22056

Excited to share our new preprint: "Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training"! Our method detects anomalies and explain it using text, without additional training. Read more here: arxiv.org/abs/2410.22056
Xinggang Wang (@xinggangwang) 's Twitter Profile Photo

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention is accepted in #CVPR2025 with the scores of 544; it is a pure linear-attention diffusion model; 1.8× faster than DiT with FlashAttention-2 at a 2048 resolution; code & paper: github.com/hustvl/DiG

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably

MatthewBerman (@matthewberman) 's Twitter Profile Photo

We knew very little about how LLMs actually work...until now. Anthropic just dropped the most insane research paper, detailing some of the ways AI "thinks." And it's completely different than we thought. Here are their wild findings: 🧵

We knew very little about how LLMs actually work...until now.

<a href="/AnthropicAI/">Anthropic</a> just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs… TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources

Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs…

TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources
Vishal Pandey (@its_vayishu) 's Twitter Profile Photo

I interviewed for an ML research internship at Meta (FAIR) a few years back. Don’t remember every detail now, but a few questions stuck with me. Questions are below.

Gabriele Berton (@gabriberton) 's Twitter Profile Photo

I can't stress enough how useful this trick has been for me in all these years It reduces GPU memory by N equal the number of losses, at literally no cost (same speed, exactly same results down to the last decimal digit) For example ... [1/2]