Michael Tschannen (@mtschannen) 's Twitter Profile
Michael Tschannen

@mtschannen

Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation. Personal account.

ID: 597511633

linkhttps://mitscha.github.io calendar_today02-06-2012 15:38:44

238 Tweet

2,2K Followers

658 Following

Google AI Developers (@googleaidevs) 's Twitter Profile Photo

PaliGemma 2 mix is an upgraded vision-language model that supports image captioning, OCR, image Q&A, object detection, and segmentation. With sizes from 3B-28B parameters, there's a model for everyone. Get started. → goo.gle/430HnDe

PaliGemma 2 mix is an upgraded vision-language model that supports image captioning, OCR, image Q&A, object detection, and segmentation. With sizes from 3B-28B parameters, there's a model for everyone. Get started. → goo.gle/430HnDe
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Google presents: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Opensources model ckpts with four sizes from 86M to 1B

Google presents:

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Opensources model ckpts with four sizes from 86M to 1B
AK (@_akhaliq) 's Twitter Profile Photo

Google just dropped SigLIP 2 Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Google just dropped SigLIP 2

Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Aritra R G (@arig23498) 's Twitter Profile Photo

Did Google just release a better version of SigLIP? SigLIP 2 is out on Hugging Face! A new family of multilingual vision-language encoders that crush it in zero-shot classification, image-text retrieval, and VLM feature extraction. 🧵👇

Did <a href="/Google/">Google</a> just release a better version of SigLIP?

SigLIP 2 is out on Hugging Face!

A new family of multilingual vision-language encoders that crush it in zero-shot classification, image-text retrieval, and VLM feature extraction.

🧵👇
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

This is exciting.... SigLIP was one of the best, highly performant image encoders used for a variety of applications, so a new version of SigLIP from the same team should be pretty good as well!

This is exciting.... SigLIP was one of the best, highly performant image encoders used for a variety of applications, so a new version of SigLIP from the same team should be pretty good as well!
merve (@mervenoyann) 's Twitter Profile Photo

SigLIP 2 is the most powerful image-text encoder you can use it to do > image-to-image search > text-to-image-search > image-to-text search > image classification with open-ended classes > train vision language models we will show you how to do all this week 🤝

SigLIP 2 is the most powerful image-text encoder

you can use it to do
&gt; image-to-image search
&gt; text-to-image-search
&gt; image-to-text search
&gt; image classification with open-ended classes
&gt; train vision language models

we will show you how to do all this week 🤝
Boris Dayma 🖍️ (@borisdayma) 's Twitter Profile Photo

Caught up on the new SigLip 2 paper 🤓 Cool things I learnt from it: - finally using patch size of 16 with size multiple of 256! I was annoyed with weird patch 14 / 224 sizes which felt like old ImageNet training augmentation artifact - reference of LocCa for pretraining which

Alexander Visheratin (@visheratin) 's Twitter Profile Photo

SigLIP2 is indeed a better encoder than SigLIP! Over the last two weekends, I trained a new SOTA multilingual model - mexma-siglip2. It has not only improved performance but also MIT license. Michael Tschannen Xiaohua Zhai Lucas Beyer (bl16) merve SkalskiP

André Araujo (@andrefaraujo) 's Twitter Profile Photo

Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

I’m so happy to announce Gemma 3 is out! 🚀 🌏Understands over 140 languages 👀Multimodal with image and video input 🤯LMArena score of 1338! 📏Context window of 128k Available in AI Studio, Hugging Face, Ollama, Vertex, and your favorite OS tools 🚀Download it today!

I’m so happy to announce Gemma 3 is out! 🚀

🌏Understands over 140 languages
👀Multimodal with image and video input
🤯LMArena score of 1338!
📏Context window of 128k

Available in AI Studio, Hugging Face, Ollama, Vertex, and your favorite OS tools 🚀Download it today!
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →

Michael Tschannen (@mtschannen) 's Twitter Profile Photo

We are presenting JetFormer at ICLR this morning, poster #190. Stop by if you’re interested in unified multimodal architectures!

Mario Lucic (@mariolucic_) 's Twitter Profile Photo

Massive advancements in video understanding with Gemini 2.5! ✨ Unlock new capabilities to process hours of video, summarize and retrieve key moments, generate animations, and even combine video with code for interactive experiences. Check the 🧵below for some cool use-cases.