Sung-Feng Huang (@sungfenghuang) 's Twitter Profile
Sung-Feng Huang

@sungfenghuang

National Taiwan University | Speech Processing & Machine Learning Lab @ntu_spml

ID: 378591891

calendar_today23-09-2011 13:32:43

22 Tweet

63 Takipçi

256 Takip Edilen

RJ Skerry-Ryan (@rustyryan) 's Twitter Profile Photo

New work with Ron Weiss, Eric Battenberg, Soroosh Mariooryad, Durk Kingma -- finally achieving what Yuxuan Wang and I set out to do in 2016 before switching to spectrograms: direct waveform generation from characters. (1/7) abs: arxiv.org/abs/2011.03568 samples: google.github.io/tacotron/publi…

Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Understanding the Difficulty of Training Transformers (arxiv.org/pdf/2004.08249…, Liyuan Liu (Lucas)) Studies instability of transformer. Dives into the dark arts of NN stability, impact of layer norm / residuals. Isolates residual paths as a main cause. Trains 60 layers transformers.

Understanding the Difficulty of Training Transformers (arxiv.org/pdf/2004.08249…, <a href="/LiyuanLucas/">Liyuan Liu (Lucas)</a>)

Studies instability of transformer. Dives into the dark arts of NN stability, impact of layer norm / residuals. Isolates residual paths as a main cause. Trains 60 layers transformers.
Hung-yi Lee (李宏毅) (@hungyilee2) 's Twitter Profile Photo

Three years ago, when we first tried to use GAN to realize ​​unsupervised ASR (arxiv.org/abs/1804.00316), I thought the idea was sci-fi. But a few days ago, Facebook AI pushed the idea of ​​using GAN for unsupervised ASR to 5.9% WER on Librispeech (ai.facebook.com/blog/wav2vec-u…).

NTU SPML Lab (@ntu_spml) 's Twitter Profile Photo

Honor to cooperate with the researchers of Facebook, CMU, MIT, and JHU to develop SUPERB. When you pre-train an LM like BERT on text, you use GLUE to evaluate its performance. How about speech? You can use SUPERB, which will be the speech version of GLUE. superbbenchmark.org

Hung-yi Lee (李宏毅) (@hungyilee2) 's Twitter Profile Photo

Two tutorials at INTERSPEECH'22. Self-Supervised Representation Learning for Speech Processing slides: docs.google.com/presentation/d… Neural Speech Synthesis slides: github.com/tts-tutorial/i…

Xuanjun (Victor) Chen 🤖 (@xjchen_ntu) 's Twitter Profile Photo

🚨 Call for Papers – ASRU 2025 Special Session 🎤 Responsible Speech & Audio Generative AI 📍 Honolulu, Hawaii · Dec 2025 Join us to tackle accountability, fairness, and trust in generative speech/music/audio systems! 👉 Deadline: May 28, 2025 🔗 Detail: codecfake.github.io/RespSA-GenAI/

🚨 Call for Papers – ASRU 2025 Special Session
🎤 Responsible Speech &amp; Audio Generative AI
📍 Honolulu, Hawaii · Dec 2025
Join us to tackle accountability, fairness, and trust in generative speech/music/audio systems!
👉 Deadline: May 28, 2025
🔗 Detail: codecfake.github.io/RespSA-GenAI/
Sreyan Ghosh (@sreyang) 's Twitter Profile Photo

We at NVIDIA and GAMMA UMD are excited to release Audio Flamingo 3, the most powerful, open, and capable large audio-language model to date! Paper: arxiv.org/abs/2507.08128 Open-source model, code, and data: research.nvidia.com/labs/adlr/AF3/ Try it out here: huggingface.co/spaces/nvidia/…

Cheng Han Chiang (姜成翰) (@dcml0714) 's Twitter Profile Photo

1/7 🔗 Introducing STITCH: our new method to make Spoken Language Models (SLMs) think and talk at the same time. Paper link 👉 arxiv.org/abs/2507.15375