arXiv Sound (@arxivsound) 's Twitter Profile
arXiv Sound

@arxivsound

Sound-related articles (cs.SD and eess.AS) on arxiv.org

ID: 1281419051571400705

calendar_today10-07-2020 02:44:55

14,14K Tweet

5,5K Followers

32 Following

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak, "Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM," arxiv.org/abs/2508.04795

arXiv Sound (@arxivsound) 's Twitter Profile Photo

David Sasu, Natalie Schluter, "Pitch Accent Detection improves Pretrained Automatic Speech Recognition," arxiv.org/abs/2508.04814

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, Joseph Keshet, "Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices," arxiv.org/abs/2508.04857

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Fangyu Du, Taiqing Li, Ziwei Zhang, Qian Qiao, Tan Yu, Dingcheng Zhen, Xu Jia, Yang Yang, Shunshun Yin, Siyuan Liu, "RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer," arxiv.org/abs/2508.05115

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Farah Wahida, M. A. P. Chamikara, et al., "From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization,", arxiv.org/abs/2508.05409

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito, "Embedding Alignment in Code Generation for Audio," arxiv.org/abs/2508.05473

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Serkan Sulun, Paula Viana, Matthew E. P. Davies, "Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries," arxiv.org/abs/2502.10154

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Alain Riou, Stefan Lattner, Ga\"etan Hadjeres, Geoffroy Peeters, "PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective," arxiv.org/abs/2309.02265

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Yiwen Guo, Irwin King, "Recent Advances in Speech Language Models: A Survey," arxiv.org/abs/2410.03751

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Han Zhu, Wei Kang, Zengwei Yao, Liyong Guo, Fangjun Kuang, Zhaoqing Li, Weiji Zhuang, Long Lin, Daniel Povey, "ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching," arxiv.org/abs/2506.13053

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Shakeel A. Sheikh, Md. Sahidullah, Ina Kodrasi, "Overview of Automatic Speech Analysis and Technologies for Neurodegenerative Disorders: Diagnosis and Assistive Applications," arxiv.org/abs/2501.03536

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li, "UniTalker: Conversational Speech-Visual Synthesis," arxiv.org/abs/2508.04585

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Henri Gode, Simon Doclo, "Closed-Form Successive Relative Transfer Function Vector Estimation based on Blind Oblique Projection Incorporating Noise Whitening," arxiv.org/abs/2508.04887

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu, "REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation," arxiv.org/abs/2508.04946

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Yuepeng Jiang, Ziqian Ning, Shuai Wang, Chengjia Wang, Mengxiao Bi, Pengcheng Zhu, Lei Xie, Zhonghua Fu, "REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers," arxiv.org/abs/2508.04996

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, "MOVER: Combining Multiple Meeting Recognition Systems," arxiv.org/abs/2508.05055

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala, "Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS," arxiv.org/abs/2508.05102

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Seraphina Fong, Marco Matassoni, Alessio Brutti, "Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages," arxiv.org/abs/2508.05149

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Tom B\"ackstr\"om, Mohammad Hassan Vali, My Nguyen, Silas Rech, "Privacy Disclosure of Similarity in Speech and Language Processing," arxiv.org/abs/2508.05250

arXiv Sound (@arxivsound) 's Twitter Profile Photo

Jiatong Li, Simon Doclo, "Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement," arxiv.org/abs/2508.05293