arXiv Sound (@arxivsound) Twitter Tweets • TwiCopy

arXiv Sound

3 months ago

Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak, "Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM," arxiv.org/abs/2508.04795

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

David Sasu, Natalie Schluter, "Pitch Accent Detection improves Pretrained Automatic Speech Recognition," arxiv.org/abs/2508.04814

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Yael Segal-Feldman, Ann R. Bradlow, Matthew Goldrick, Joseph Keshet, "Keyword Spotting with Hyper-Matched Filters for Small Footprint Devices," arxiv.org/abs/2508.04857

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Fangyu Du, Taiqing Li, Ziwei Zhang, Qian Qiao, Tan Yu, Dingcheng Zhen, Xu Jia, Yang Yang, Shunshun Yin, Siyuan Liu, "RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer," arxiv.org/abs/2508.05115

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Farah Wahida, M. A. P. Chamikara, et al., "From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization,", arxiv.org/abs/2508.05409

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito, "Embedding Alignment in Code Generation for Audio," arxiv.org/abs/2508.05473

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Serkan Sulun, Paula Viana, Matthew E. P. Davies, "Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries," arxiv.org/abs/2502.10154

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Alain Riou, Stefan Lattner, Ga\"etan Hadjeres, Geoffroy Peeters, "PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective," arxiv.org/abs/2309.02265

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Yiwen Guo, Irwin King, "Recent Advances in Speech Language Models: A Survey," arxiv.org/abs/2410.03751

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Han Zhu, Wei Kang, Zengwei Yao, Liyong Guo, Fangjun Kuang, Zhaoqing Li, Weiji Zhuang, Long Lin, Daniel Povey, "ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching," arxiv.org/abs/2506.13053

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Shakeel A. Sheikh, Md. Sahidullah, Ina Kodrasi, "Overview of Automatic Speech Analysis and Technologies for Neurodegenerative Disorders: Diagnosis and Assistive Applications," arxiv.org/abs/2501.03536

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li, "UniTalker: Conversational Speech-Visual Synthesis," arxiv.org/abs/2508.04585

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Henri Gode, Simon Doclo, "Closed-Form Successive Relative Transfer Function Vector Estimation based on Blind Oblique Projection Incorporating Noise Whitening," arxiv.org/abs/2508.04887

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Nameer Hirschkind, Joseph Liu, Mahesh Kumar Nandwana, Xiao Yu, "REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation," arxiv.org/abs/2508.04946

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Yuepeng Jiang, Ziqian Ning, Shuai Wang, Chengjia Wang, Mengxiao Bi, Pengcheng Zhu, Lei Xie, Zhonghua Fu, "REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers," arxiv.org/abs/2508.04996

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Naoyuki Kamo, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, "MOVER: Combining Multiple Meeting Recognition Systems," arxiv.org/abs/2508.05055

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

arXiv Sound

@arxivsound

3 months ago

Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala, "Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS," arxiv.org/abs/2508.05102

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Seraphina Fong, Marco Matassoni, Alessio Brutti, "Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages," arxiv.org/abs/2508.05149

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

arXiv Sound

@arxivsound

3 months ago

Tom B\"ackstr\"om, Mohammad Hassan Vali, My Nguyen, Silas Rech, "Privacy Disclosure of Similarity in Speech and Language Processing," arxiv.org/abs/2508.05250

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

arXiv Sound

@arxivsound

3 months ago

Jiatong Li, Simon Doclo, "Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement," arxiv.org/abs/2508.05293

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare