Andreas Steiner (@andreaspsteiner) Twitter Tweets • TwiCopy

Michael Tschannen

2 years ago

Decoder-only models only work with discrete tokens, right? 🤔 Excited to present 🎁GIVT: Generative Infinite-Vocabulary Transformers, a simple way to generate arbitrary vector sequences with real-valued entries using transformer decoder-only models! arxiv.org/abs/2312.02116 1/

thumb_up_off_alt747

chat_bubble_outline15

repeat116

shareShare

merve

@mervenoyann

a year ago

Welcome PaliGemma 2! 🤗 Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁 Saying this model is amazing would be an understatement, keep reading ✨

thumb_up_off_alt1,1K

chat_bubble_outline28

repeat259

shareShare

Lucas Beyer (bl16)

@giffmana

a year ago

The fourth nice thing we* have for you this week: PaliGemma 2. It’s also a perfect transition: this v2 was carried a lot more by Andreas Steiner André Susano Pinto and Michael Tschannen than by us. Crazy new sota tasks! Interesting res vs LLM size study! Better OCR! Less hallucination!

thumb_up_off_alt194

chat_bubble_outline3

repeat16

shareShare

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن

@ibomohsin

a year ago

Attending #NeurIPS2024? If you're interested in multimodal systems, building inclusive & culturally aware models, and how fractals relate to LLMs, we've 3 posters for you. I look forward to presenting them on behalf of our GDM team @ Zurich & collaborators. Details below (1/4)

thumb_up_off_alt23

chat_bubble_outline2

repeat7

shareShare

Lucas Beyer (bl16)

@giffmana

a year ago

Alex been pulling a StarWars here, with the JetFormer paper (Episodes 4-6) coming out before the Jet paper (Episodes 1-3). Read here on the simple way for turning ViT into a sota flow model:

thumb_up_off_alt245

chat_bubble_outline2

repeat22

shareShare

Michael Tschannen

@mtschannen

a year ago

Check out our detailed report about *Jet* 🌊 - a simple, transformer-based normalizing flow architecture without bells and whistles. Jet is an important part of JetFormer's engine ⚙️ As a standalone model it is very tame and behaves predictably (e.g. when scaling it up).

thumb_up_off_alt32

chat_bubble_outline0

repeat8

shareShare

André Susano Pinto

@asusanopinto

a year ago

Jet, the tool in JetFormer. A coupling normalizing flow where the blocks are powered by ViT. Simple, scalable and it works!

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Ibrahim Alabdulmohsin | إبراهيم العبدالمحسن

@ibomohsin

10 months ago

🔥Excited to introduce RINS - a technique that boosts model performance by recursively applying early layers during inference without increasing model size or training compute flops! Not only does it significantly improve LMs, but also multimodal systems like SigLIP. (1/N)

thumb_up_off_alt37

chat_bubble_outline2

repeat12

shareShare

merve

@mervenoyann

9 months ago

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥 > Three new models: 3B, 10B, 28B with res 224, 448 💙 > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

thumb_up_off_alt929

chat_bubble_outline16

repeat148

shareShare

Andreas Steiner

@andreaspsteiner

9 months ago

Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute! Not sure yet if you want to invest the time 🪄finetuning🪄 on your data? Give it a try with our ready-to-use "mix" checkpoints: 🤗 huggingface.co/blog/paligemma… 🎤 developers.googleblog.com/en/introducing…

thumb_up_off_alt67

chat_bubble_outline3

repeat16

shareShare

Prince Canuma

@prince_canuma

9 months ago

PaliGemma 2 Mix now on MLX 🔥🚀 > pip install mlx-vlm Model cards 👇🏾

thumb_up_off_alt77

chat_bubble_outline4

repeat9

shareShare

Nitin Tiwari

@nstiwari21

9 months ago

Now, you can run Google DeepMind PaliGemma 2 models directly in the browser with Hugging Face Transformers.js! Check out how I converted the latest paligemma2-3b-mix-224 model to ONNX and deployed it on Node.js web app. ✨ #GemmaVerse #PaliGemma2 @GoogleDevExpert Omar Sanseviero

thumb_up_off_alt18

chat_bubble_outline2

repeat4

shareShare

Andreas Steiner

@andreaspsteiner

9 months ago

Gemma 3 - amazing multimodal performance at 4B, 12B, and 27B scale with lmsys ELO ranking better than leet score on a single GPU !

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Delip Rao e/σ

@deliprao

9 months ago

Highest intelligence compression we have seen in any open model. (Also beats o3-mini). Multimodal. Multilingual. Tool calls. Weights on huggingface. So many reasons to be excited about this!

thumb_up_off_alt823

chat_bubble_outline38

repeat102

shareShare