Tara Sainath (@tnsainath) Twitter Tweets • TwiCopy

Jeff Dean

2 years ago

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,

thumb_up_off_alt12,12K

chat_bubble_outline260

repeat2,2K

shareShare

Google DeepMind

@googledeepmind

2 years ago

Along with text, images, video and code, Gemini is able to process raw audio signal end-to-end. 🔊 It can listen to and understand speech, making it not only useful for transcription but a model that has a much more nuanced perception of its environment. ↓

thumb_up_off_alt744

chat_bubble_outline20

repeat131

shareShare

Oriol Vinyals

@oriolvinyalsml

2 years ago

Gemini 1.5 has arrived. Pro 1.5 with 1M tokens available as an experimental feature via AI Studio and Vertex AI in private preview. Then there’s this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 🤯10M 🤯 tokens for text. From

thumb_up_off_alt899

chat_bubble_outline60

repeat167

shareShare

Jeff Dean

@jeffdean

a year ago

This video is a glimpse of Project Astra's utility in going about daily life. Remember this door code. What do these funny laundry icons mean on my clothes tag? Will this bus take me where I want to go? Which book will my friend enjoy the most? What can you tell me about this

thumb_up_off_alt49

chat_bubble_outline0

repeat5

shareShare

Logan Kilpatrick

@officiallogank

a year ago

Gemini 2.0 Flash comes with native audio output, and it’s actually wild 🤯 we are working hard to roll this out quickly to more folks!

thumb_up_off_alt1,1K

chat_bubble_outline148

repeat199

shareShare

Google DeepMind

@googledeepmind

7 months ago

💬 Smarter dialogue: Gemini-powered native audio means Project Astra has better context and customizable accents. 🕹️ Takes action: Computer control lets it open and engage with apps at your direction. 🤝 Personalized help: Integrates with your @Gmail, @GoogleCalendar and more

thumb_up_off_alt103

chat_bubble_outline3

repeat5

shareShare

Tara Sainath

@tnsainath

7 months ago

check out the new live audio-to-audio dialog model. Native audio with proactivity, affective dialog, tool calling and more.

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Tara Sainath

@tnsainath

7 months ago

The audio team released new dialog and TTS models. check it out at aistudio.google.com/live

thumb_up_off_alt25

chat_bubble_outline0

repeat14

shareShare

Sad Albert

@mars53208096

7 months ago

Do NOT SLEEP on Gemini 2.5's multimodal audio! It is 100 times better than GPT 4o, 50 times less censored and 1000 times better than Grok🐸. Check these examples out of Gemini 2.5's emotional speech capabilities. It does Not have voice cracks and a lot capable and clearer than I

thumb_up_off_alt77

chat_bubble_outline5

repeat8

shareShare

Google AI Developers

@googleaidevs

7 months ago

See Native Audio in action 🤠🦊 Our "Mumble Jumble" demo in Google AI Studio showcases the Live API's advanced voice capabilities: natural flow, distinct tone, emotion, and multilingual support.

thumb_up_off_alt209

chat_bubble_outline10

repeat39

shareShare

Tara Sainath

@tnsainath

6 months ago

Check out the new thinking dialog model we have, which can handle a lot more complex reasoning tasks.

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

Google DeepMind

@googledeepmind

6 months ago

Our native audio capabilities are making AI conversations more natural – from understanding tone to generating expressive speech. ✍️🗣️ This could open up new possibilities for how we interact with AI. Developers, try it through Google AI Studio. Learn more. ↓

thumb_up_off_alt901

chat_bubble_outline47

repeat164

shareShare

Tara Sainath

@tnsainath

6 months ago

Check out the native audio dialog and TTS models from our team....landed in AI Studio

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare