Sreyan Ghosh (@sreyang) 's Twitter Profile
Sreyan Ghosh

@sreyang

Ph.D. in CS at University of Maryland, College Park | Ex- Adobe Research, NVIDIA, Cisco | Speech, Audio and Language Processing Researcher

ID: 2582506255

linkhttps://sreyan88.github.io/ calendar_today22-06-2014 16:21:12

311 Tweet

245 Followers

278 Following

GAMMA UMD (@gammaumd) 's Twitter Profile Photo

🚀 Audio General Intelligence (AGI) is no longer a dream — it’s here. Introducing Audio Flamingo 3 — open-source, multimodal, and groundbreaking. It listens. It understands. It reasons across sound and language. 💥 Code, weights, datasets, paper — all open. 📄Paper:

🚀 Audio General Intelligence (AGI) is no longer a dream — it’s here.

Introducing Audio Flamingo 3 — open-source, multimodal, and groundbreaking.

It listens. It understands. It reasons across sound and language.

💥 Code, weights, datasets, paper — all open.
📄Paper:
Marktechpost AI Research News ⚡ (@marktechpost) 's Twitter Profile Photo

NVIDIA Releases Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence NVIDIA’s Audio Flamingo 3 (AF3) is a fully open-source large audio-language model that significantly advances the field of Audio General Intelligence. Unlike earlier systems focused on

NVIDIA Releases Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence

NVIDIA’s Audio Flamingo 3 (AF3) is a fully open-source large audio-language model that significantly advances the field of Audio General Intelligence. Unlike earlier systems focused on
Niels Rogge (@nielsrogge) 's Twitter Profile Photo

Open-source audio scene is quite on 🔥 lately! - kyutai STT, TTS modules and Unmute fully open-sourced - NVIDIA drops 3 models: Parakeet (beats Whisper), Audio Flamingo 3 and Canary-Qwen-2.5B (new SOTA on Hugging Face leaderboard) - Mistral AI released 3B and 24B Voxtral

Sakib (@zsakib_) 's Twitter Profile Photo

zsxkib/audio-flamingo-3 from NVIDIA a chain-of-thought audio language model (that's small+fast) on Replicate you can upload an mp3 and ask: > what instruments do you hear?🙉 > transcribe any speech you hear🗣️ > please describe the audio in detail🎨 > answer the question💬

zsxkib/audio-flamingo-3 from <a href="/nvidia/">NVIDIA</a> 
a chain-of-thought audio language model (that's small+fast) on <a href="/replicate/">Replicate</a>
you can upload an mp3 and ask:
&gt; what instruments do you hear?🙉
&gt; transcribe any speech you hear🗣️
&gt; please describe the audio in detail🎨
&gt; answer the question💬
Sakib (@zsakib_) 's Twitter Profile Photo

nvidia's audio-flamingo-3 in action (sound on 🔇→🎶) > upload an audio clip saying "what are the names of some famous actors who started their careers on broadway"🎭 > prompt "answer" TIL Tom Hanks was on broadway

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🎶 Meet Audio-Flamingo 3 – a fully open LALM trained on sound, speech, and music datasets. 🎶 Handles 10-min audio, long-form text, and voice conversations. Perfect for audio QA, dialog, and reasoning. On Hugging Face ➡️ huggingface.co/nvidia/audio-f… From #NVIDIAResearch.

naveen manwani (@naveenmanwani17) 's Twitter Profile Photo

🚨Paper Alert 🚨 ➡️Paper Title: Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models 🌟Few pointers from the paper 🎯Authors of this paper presented “Audio Flamingo 3 (AF3)”, a fully open state-of-the-art (SOTA) large audio-language model

ハカセ アイ(Ai-Hakase)🐾最新トレンドAIのためのX 🐾 (@ai_hakase_) 's Twitter Profile Photo

【聴くAIの革命!NVIDIA Audio Flamingo 3 (AF3)で音声理解が進化!】 NVIDIAが画期的な技術「Audio Flamingo 3 (AF3)」を発表!✨ 音声、音、音楽を統合的に理解するオープンソースの最先端AIモデルです👂💡

【聴くAIの革命!NVIDIA Audio Flamingo 3 (AF3)で音声理解が進化!】
NVIDIAが画期的な技術「Audio Flamingo 3 (AF3)」を発表!✨ 音声、音、音楽を統合的に理解するオープンソースの最先端AIモデルです👂💡
Banghua Zhu (@banghuaz) 's Twitter Profile Photo

That's exactly why I'm excited about the unique position of the post-training team at NVIDIA. We’re not just releasing open-weight models — we fully open source the data, code, and technical details. Small team, moving fast. The competition is fierce, and Chinese open model

Rabeeh Karimi (@karimirabeeh) 's Twitter Profile Photo

We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

As part of Nemotron, we're releasing a new Math dataset, made by rendering webpages using Lynx and then using an LLM to rewrite the result into LaTeX. Our models got much better at math when we started using this dataset. We hope it's helpful to the community. 💚

HanRong YE (@leoyerrrr) 's Twitter Profile Photo

OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a

OmniVinci is now #1 paper on Huggingface!!! 🤗

Building omni-modal LLMs is MORE than just mixing tokens 😉

At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations:

- OmniAlignNet – a
kyutai (@kyutai_labs) 's Twitter Profile Photo

1/2 We’re releasing an in-depth tutorial on neural audio codecs, the secret sauce that makes it possible for audio LLMs to not sound like a horror movie: