Michael Tschannen (@mtschannen) Twitter Tweets • TwiCopy

Michael Tschannen

@mtschannen

+ Follow

Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation. Personal account.

ID: 597511633

linkhttps://mitscha.github.io calendar_today02-06-2012 15:38:44

238 Tweet

2,2K Followers

658 Following

Google AI Developers

@googleaidevs

7 months ago

PaliGemma 2 mix is an upgraded vision-language model that supports image captioning, OCR, image Q&A, object detection, and segmentation. With sizes from 3B-28B parameters, there's a model for everyone. Get started. → goo.gle/430HnDe

thumb_up_off_alt701

chat_bubble_outline28

repeat113

shareShare

Aran Komatsuzaki

@arankomatsuzaki

7 months ago

Google presents: SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Opensources model ckpts with four sizes from 86M to 1B

thumb_up_off_alt422

chat_bubble_outline6

repeat67

shareShare

AK

@_akhaliq

7 months ago

Google just dropped SigLIP 2 Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

thumb_up_off_alt393

chat_bubble_outline9

repeat68

shareShare

Aritra R G

@arig23498

7 months ago

Did Google just release a better version of SigLIP? SigLIP 2 is out on Hugging Face! A new family of multilingual vision-language encoders that crush it in zero-shot classification, image-text retrieval, and VLM feature extraction. 🧵👇

Did <a href="/Google/">Google</a> just release a better version of SigLIP?

SigLIP 2 is out on Hugging Face!

A new family of multilingual vision-language encoders that crush it in zero-shot classification, image-text retrieval, and VLM feature extraction.

🧵👇

thumb_up_off_alt160

chat_bubble_outline9

repeat21

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

7 months ago

This is exciting.... SigLIP was one of the best, highly performant image encoders used for a variety of applications, so a new version of SigLIP from the same team should be pretty good as well!

thumb_up_off_alt139

chat_bubble_outline5

repeat24

shareShare

merve

@mervenoyann

7 months ago

SigLIP 2 is the most powerful image-text encoder you can use it to do > image-to-image search > text-to-image-search > image-to-text search > image classification with open-ended classes > train vision language models we will show you how to do all this week 🤝

thumb_up_off_alt596

chat_bubble_outline13

repeat80

shareShare

Boris Dayma 🖍️

@borisdayma

7 months ago

Caught up on the new SigLip 2 paper 🤓 Cool things I learnt from it: - finally using patch size of 16 with size multiple of 256! I was annoyed with weird patch 14 / 224 sizes which felt like old ImageNet training augmentation artifact - reference of LocCa for pretraining which

thumb_up_off_alt62

chat_bubble_outline1

repeat8

shareShare

Alexander Visheratin

@visheratin

7 months ago

SigLIP2 is indeed a better encoder than SigLIP! Over the last two weekends, I trained a new SOTA multilingual model - mexma-siglip2. It has not only improved performance but also MIT license. Michael Tschannen Xiaohua Zhai Lucas Beyer (bl16) merve SkalskiP

thumb_up_off_alt182

chat_bubble_outline1

repeat20

shareShare

Lucas Beyer (bl16)

@giffmana

7 months ago

SigLIP2 + Aya = really good multilingual VLMs!

thumb_up_off_alt184

chat_bubble_outline3

repeat25

shareShare

André Araujo

@andrefaraujo

7 months ago

Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Omar Sanseviero

@osanseviero

6 months ago

I’m so happy to announce Gemma 3 is out! 🚀 🌏Understands over 140 languages 👀Multimodal with image and video input 🤯LMArena score of 1338! 📏Context window of 128k Available in AI Studio, Hugging Face, Ollama, Vertex, and your favorite OS tools 🚀Download it today!

thumb_up_off_alt3,3K

chat_bubble_outline266

repeat447

shareShare

Google DeepMind

@googledeepmind

6 months ago

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now →

thumb_up_off_alt2,2K

chat_bubble_outline93

repeat522

shareShare

Michael Tschannen

@mtschannen

5 months ago

We are presenting JetFormer at ICLR this morning, poster #190. Stop by if you’re interested in unified multimodal architectures!

thumb_up_off_alt220

chat_bubble_outline6

repeat31

shareShare

Mario Lucic

@mariolucic_

5 months ago

Massive advancements in video understanding with Gemini 2.5! ✨ Unlock new capabilities to process hours of video, summarize and retrieve key moments, generate animations, and even combine video with code for interactive experiences. Check the 🧵below for some cool use-cases.

thumb_up_off_alt28

chat_bubble_outline1

repeat2

shareShare

Michael Tschannen

@mtschannen

4 months ago

📢 We just released the code for JetFormer at github.com/google-researc… Enjoy!

thumb_up_off_alt309

chat_bubble_outline5

repeat61

shareShare