Ankur Bapna (@ankurbpn) 's Twitter Profile
Ankur Bapna

@ankurbpn

Native Audio in Gemini @GoogleDeepmind

ID: 2330682822

calendar_today06-02-2014 18:33:56

433 Tweet

855 Followers

634 Following

Sai Nemani (@sainemani1) 's Twitter Profile Photo

Gemini Native Audio is INSANE! It literally made this video. The editing is mine though :) (Also the thumbnail(s) are AI generated) youtu.be/qsabdvDsVXM

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

Gemini 2.5 Flash Preview now supports native audio output via the Live API for seamless, natural spoken interactions and greater voice control. A new experimental thinking version of this audio model supports reasoning capabilities for more complex tasks. ai.google.dev/gemini-api/doc…

ホーダチ-Hodatsu | LLM Researcher Γ— AI Engineer (@hokazuya) 's Twitter Profile Photo

Gemini 2.5 Flash Preview Native Audio DialogγŒγ€ γ™γ”γ™γŽγ¦γ€η¬‘γ£γ¦γ—γΎγ£γŸο½— γ“γ‚Œγ‚‚γ£γ¨θ©±ι‘Œγ«γͺγ£γ¦γ‚‚γ‚ˆγ„γ‚“γ˜γ‚ƒγ€γ¨ζ€γ†γƒ¬γƒ™γƒ«γ€‚οΌˆεƒ•γŒθ¦³ζΈ¬γ—γ¦γ„γͺいだけかもだけど、veoγ¨γ‹γ‚ˆγ‚Šγ‚‚γ―γ‚‹γ‹γ«εƒ•γ―γ“γ‚ŒγŒγ™γ’γ‡γ§γ’γ™οΌ‰

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Building the text-to-speech Agents and Apps with Google DeepMind Gemini 2.5 is super easy! Single API request to generate 5-10 minute long audio in one of 30 voices in 24 languages or with multiple speakers!

Building the text-to-speech Agents and Apps with <a href="/GoogleDeepMind/">Google DeepMind</a> Gemini 2.5 is super easy! Single API request to generate 5-10 minute long audio in one of 30 voices in 24 languages or with multiple speakers!
πŸ‘©β€πŸ’» Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

πŸ’¬ Did you know that the Gemini APIs in @GoogleAIStudio support text-to-speech (TTS)? Even better: it's supported in multiple languages, accents, and tones (including whisper, angry, sad, excited, and more). We even support multiple speakers! πŸ‘‡Learn more in the docs below:

πŸ’¬ Did you know that the Gemini APIs in @GoogleAIStudio support text-to-speech (TTS)?

Even better: it's supported in multiple languages, accents, and tones (including whisper, angry, sad, excited, and more). We even support multiple speakers!

πŸ‘‡Learn more in the docs below:
πŸ‘©β€πŸ’» Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

europeans will say "let's get something quick for a snack" and proceed to randomly select one of four bakeries within line of sight that all sell the best sandwich you've ever eaten, for $4 someone please disrupt the lie that is the american sandwich market

europeans will say "let's get something quick for a snack" and proceed to randomly select one of four bakeries within line of sight that all sell the best sandwich you've ever eaten, for $4

someone please disrupt the lie that is the american sandwich market
πŸ‘©β€πŸ’» Paige Bailey (@dynamicwebpaige) 's Twitter Profile Photo

🐀 This commercial is using Veo 3 to generate the visuals, Gemini text-to-speech for the voiceover, and MusicFX for the audio. I actually like it better than the original (and it only took 8 minutes to create, not counting the video generation time!):

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

πŸ”ŠNative audio outputs in Gemini 2.5 give developers new ways to build richer applications with conversation and speech. ↓ blog.google/technology/goo…

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Our native audio capabilities are making AI conversations more natural – from understanding tone to generating expressive speech. βœοΈπŸ—£οΈ This could open up new possibilities for how we interact with AI. Developers, try it through Google AI Studio. Learn more. ↓

Google (@google) 's Twitter Profile Photo

New native audio capabilities in Gemini 2.5 enable text-to-speech in over 24 languages. πŸ”ŠVoices are more natural and expressive, and you can seamlessly switch between languages.

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Our latest Gemini 2.5 Pro update is now in preview. It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads lmarena.ai with a 24pt Elo score jump since the previous version. We also

Our latest Gemini 2.5 Pro update is now in preview.

It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads <a href="/lmarena_ai/">lmarena.ai</a> with a 24pt Elo score jump since the previous version.

We also