Simon Tran (@tnpthanhh) Twitter Tweets • TwiCopy

Shyam Gollakota

a year ago

Our new Interspeech paper extracts a target conversation recording in extremely noisy scenarios. Our deep learning method automatically identifies who is speaking in the conversation and extracts their voices. Useful for interviews, vlogs, and AI agents in real-world. Paper:

thumb_up_off_alt30

chat_bubble_outline1

repeat5

shareShare

arXiv Sound

@arxivsound

a year ago

``Target Speaker ASR with Whisper,'' Alexander Polok, Dominik Klement, Matthew Wiesner, Sanjeev Khudanpur, Jan \v{C}ernock\'y, Luk\'a\v{s} Burget, ift.tt/41sdNo2

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Xubo Liu @ ICLR 2025 🇸🇬

@liuxub

a year ago

I'm excited to introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a new codec model that can disentangle arbitrary audio sources into distinct latent codes for speech, music, and SFX. Check our paper below 👇 Paper: arxiv.org/abs/2409.11228

thumb_up_off_alt158

chat_bubble_outline2

repeat23

shareShare

Yohei Kawaguchi

@yohekawag

10 months ago

Excited to share our new preprint: "Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training"! Our method detects anomalies and explain it using text, without additional training. Read more here: arxiv.org/abs/2410.22056

thumb_up_off_alt32

chat_bubble_outline1

repeat9

shareShare

Yujin Tang

@yujintang99

10 months ago

Tips for Writing a Research Paper using LaTeX github.com/guanyingc/late…

thumb_up_off_alt1,1K

chat_bubble_outline7

repeat335

shareShare

Teknium (e/λ)

@teknium1

7 months ago

This is the entire code needed to reproduce R1 lol Hundreds of Billions of Dollars Later

thumb_up_off_alt18,18K

chat_bubble_outline431

repeat1,1K

shareShare

Xinggang Wang

@xinggangwang

6 months ago

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention is accepted in #CVPR2025 with the scores of 544; it is a pure linear-attention diffusion model; 1.8× faster than DiT with FlashAttention-2 at a 2048 resolution; code & paper: github.com/hustvl/DiG

thumb_up_off_alt231

chat_bubble_outline2

repeat44

shareShare

Thomas Wolf

@thom_wolf

6 months ago

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably

thumb_up_off_alt2,2K

chat_bubble_outline277

repeat501

shareShare

MatthewBerman

@matthewberman

5 months ago

We knew very little about how LLMs actually work...until now. Anthropic just dropped the most insane research paper, detailing some of the ways AI "thinks." And it's completely different than we thought. Here are their wild findings: 🧵

We knew very little about how LLMs actually work...until now.

<a href="/AnthropicAI/">Anthropic</a> just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵

thumb_up_off_alt10,10K

chat_bubble_outline86

repeat1,1K

shareShare

Cameron R. Wolfe, Ph.D.

@cwolferesearch

4 months ago

Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs… TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat248

shareShare

Vishal Pandey

@its_vayishu

4 months ago

I interviewed for an ML research internship at Meta (FAIR) a few years back. Don’t remember every detail now, but a few questions stuck with me. Questions are below.

thumb_up_off_alt4,4K

chat_bubble_outline40

repeat300

shareShare

Gabriele Berton

@gabriberton

2 months ago

I can't stress enough how useful this trick has been for me in all these years It reduces GPU memory by N equal the number of losses, at literally no cost (same speed, exactly same results down to the last decimal digit) For example ... [1/2]

thumb_up_off_alt2,2K

chat_bubble_outline30

repeat199

shareShare