Sreyan Ghosh
@sreyang
Ph.D. in CS at University of Maryland, College Park | Ex- Adobe Research, NVIDIA, Cisco | Speech, Audio and Language Processing Researcher
ID: 2582506255
https://sreyan88.github.io/ 22-06-2014 16:21:12
311 Tweet
245 Followers
278 Following
Open-source audio scene is quite on 🔥 lately! - kyutai STT, TTS modules and Unmute fully open-sourced - NVIDIA drops 3 models: Parakeet (beats Whisper), Audio Flamingo 3 and Canary-Qwen-2.5B (new SOTA on Hugging Face leaderboard) - Mistral AI released 3B and 24B Voxtral
🎶 Meet Audio-Flamingo 3 – a fully open LALM trained on sound, speech, and music datasets. 🎶 Handles 10-min audio, long-form text, and voice conversations. Perfect for audio QA, dialog, and reasoning. On Hugging Face ➡️ huggingface.co/nvidia/audio-f… From #NVIDIAResearch.