Ankur Bapna (@ankurbpn) 's Twitter Profile
Ankur Bapna

@ankurbpn

Native Audio in Gemini @GoogleDeepmind

ID: 2330682822

calendar_today06-02-2014 18:33:56

433 Tweet

855 Takipçi

634 Takip Edilen

Sai Nemani (@sainemani1) 's Twitter Profile Photo

Gemini Native Audio is INSANE! It literally made this video. The editing is mine though :) (Also the thumbnail(s) are AI generated) youtu.be/qsabdvDsVXM

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

Gemini 2.5 Flash Preview now supports native audio output via the Live API for seamless, natural spoken interactions and greater voice control. A new experimental thinking version of this audio model supports reasoning capabilities for more complex tasks. ai.google.dev/gemini-api/doc…

ホーダチ-Hodatsu | LLM Researcher × AI Engineer (@hokazuya) 's Twitter Profile Photo

Gemini 2.5 Flash Preview Native Audio Dialogが、 すごすぎて、笑ってしまったw これもっと話題になってもよいんじゃ、と思うレベル。(僕が観測していないだけかもだけど、veoとかよりもはるかに僕はこれがすげぇでげす)

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Building the text-to-speech Agents and Apps with Google DeepMind Gemini 2.5 is super easy! Single API request to generate 5-10 minute long audio in one of 30 voices in 24 languages or with multiple speakers!

Building the text-to-speech Agents and Apps with <a href="/GoogleDeepMind/">Google DeepMind</a> Gemini 2.5 is super easy! Single API request to generate 5-10 minute long audio in one of 30 voices in 24 languages or with multiple speakers!
👩‍💻 Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

💬 Did you know that the Gemini APIs in @GoogleAIStudio support text-to-speech (TTS)? Even better: it's supported in multiple languages, accents, and tones (including whisper, angry, sad, excited, and more). We even support multiple speakers! 👇Learn more in the docs below:

💬 Did you know that the Gemini APIs in @GoogleAIStudio support text-to-speech (TTS)?

Even better: it's supported in multiple languages, accents, and tones (including whisper, angry, sad, excited, and more). We even support multiple speakers!

👇Learn more in the docs below:
👩‍💻 Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

europeans will say "let's get something quick for a snack" and proceed to randomly select one of four bakeries within line of sight that all sell the best sandwich you've ever eaten, for $4 someone please disrupt the lie that is the american sandwich market

europeans will say "let's get something quick for a snack" and proceed to randomly select one of four bakeries within line of sight that all sell the best sandwich you've ever eaten, for $4

someone please disrupt the lie that is the american sandwich market
👩‍💻 Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

🐤 This commercial is using Veo 3 to generate the visuals, Gemini text-to-speech for the voiceover, and MusicFX for the audio. I actually like it better than the original (and it only took 8 minutes to create, not counting the video generation time!):

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

🔊Native audio outputs in Gemini 2.5 give developers new ways to build richer applications with conversation and speech. ↓ blog.google/technology/goo…

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Our native audio capabilities are making AI conversations more natural – from understanding tone to generating expressive speech. ✍️🗣️ This could open up new possibilities for how we interact with AI. Developers, try it through Google AI Studio. Learn more. ↓

Google (@google) 's Twitter Profile Photo

New native audio capabilities in Gemini 2.5 enable text-to-speech in over 24 languages. 🔊Voices are more natural and expressive, and you can seamlessly switch between languages.

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Our latest Gemini 2.5 Pro update is now in preview. It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads lmarena.ai with a 24pt Elo score jump since the previous version. We also

Our latest Gemini 2.5 Pro update is now in preview.

It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads <a href="/lmarena_ai/">lmarena.ai</a> with a 24pt Elo score jump since the previous version.

We also