Andreas Steiner (@andreaspsteiner) 's Twitter Profile
Andreas Steiner

@andreaspsteiner

Researching #ComputerVision at #GoogleDeepMind using JAX/Flax (github.com/google/flax). views are my own.

ID: 1430128930875445249

calendar_today24-08-2021 11:25:00

81 Tweet

1,1K Followers

127 Following

Michael Tschannen (@mtschannen) 's Twitter Profile Photo

Decoder-only models only work with discrete tokens, right? 🤔 Excited to present 🎁GIVT: Generative Infinite-Vocabulary Transformers, a simple way to generate arbitrary vector sequences with real-valued entries using transformer decoder-only models! arxiv.org/abs/2312.02116 1/

Decoder-only models only work with discrete tokens, right? 🤔 Excited to present

🎁GIVT: Generative Infinite-Vocabulary Transformers,

a simple way to generate arbitrary vector sequences with real-valued entries using transformer decoder-only models!

arxiv.org/abs/2312.02116

1/
merve (@mervenoyann) 's Twitter Profile Photo

Welcome PaliGemma 2! 🤗 Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁 Saying this model is amazing would be an understatement, keep reading ✨

Welcome PaliGemma 2! 🤗

Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁

Saying this model is amazing would be an understatement, keep reading  ✨
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

The fourth nice thing we* have for you this week: PaliGemma 2. It’s also a perfect transition: this v2 was carried a lot more by Andreas Steiner André Susano Pinto and Michael Tschannen than by us. Crazy new sota tasks! Interesting res vs LLM size study! Better OCR! Less hallucination!

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن (@ibomohsin) 's Twitter Profile Photo

Attending #NeurIPS2024? If you're interested in multimodal systems, building inclusive & culturally aware models, and how fractals relate to LLMs, we've 3 posters for you. I look forward to presenting them on behalf of our GDM team @ Zurich & collaborators. Details below (1/4)

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Alex been pulling a StarWars here, with the JetFormer paper (Episodes 4-6) coming out before the Jet paper (Episodes 1-3). Read here on the simple way for turning ViT into a sota flow model:

Alex been pulling a StarWars here, with the JetFormer paper (Episodes 4-6) coming out before the Jet paper (Episodes 1-3).

Read here on the simple way for turning ViT into a sota flow model:
Michael Tschannen (@mtschannen) 's Twitter Profile Photo

Check out our detailed report about *Jet* 🌊 - a simple, transformer-based normalizing flow architecture without bells and whistles. Jet is an important part of JetFormer's engine ⚙️ As a standalone model it is very tame and behaves predictably (e.g. when scaling it up).

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن (@ibomohsin) 's Twitter Profile Photo

🔥Excited to introduce RINS - a technique that boosts model performance by recursively applying early layers during inference without increasing model size or training compute flops! Not only does it significantly improve LMs, but also multimodal systems like SigLIP. (1/N)

🔥Excited to introduce RINS - a technique that boosts model performance by recursively applying early layers during inference without increasing model size or training compute flops! Not only does it significantly improve LMs, but also multimodal systems like SigLIP. 
(1/N)
merve (@mervenoyann) 's Twitter Profile Photo

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥 > Three new models: 3B, 10B, 28B with res 224, 448 💙 > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

Andreas Steiner (@andreaspsteiner) 's Twitter Profile Photo

Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute! Not sure yet if you want to invest the time 🪄finetuning🪄 on your data? Give it a try with our ready-to-use "mix" checkpoints: 🤗 huggingface.co/blog/paligemma… 🎤 developers.googleblog.com/en/introducing…

Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute!

Not sure yet if you want to invest the time 🪄finetuning🪄 on your data? Give it a try with our ready-to-use "mix" checkpoints:

🤗 huggingface.co/blog/paligemma…
🎤 developers.googleblog.com/en/introducing…
Nitin Tiwari (@nstiwari21) 's Twitter Profile Photo

Now, you can run Google DeepMind PaliGemma 2 models directly in the browser with Hugging Face Transformers.js! Check out how I converted the latest paligemma2-3b-mix-224 model to ONNX and deployed it on Node.js web app. ✨ #GemmaVerse #PaliGemma2 @GoogleDevExpert Omar Sanseviero

Andreas Steiner (@andreaspsteiner) 's Twitter Profile Photo

Gemma 3 - amazing multimodal performance at 4B, 12B, and 27B scale with lmsys ELO ranking better than leet score on a single GPU !

Delip Rao e/σ (@deliprao) 's Twitter Profile Photo

Highest intelligence compression we have seen in any open model. (Also beats o3-mini). Multimodal. Multilingual. Tool calls. Weights on huggingface. So many reasons to be excited about this!

Highest intelligence compression we have seen in any open model. (Also beats o3-mini). Multimodal. Multilingual. Tool calls. Weights on huggingface. So many reasons to be excited about this!