AK(@_akhaliq) 's Twitter Profile Photo

PLLaVA

Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands

PLLaVA

Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands
account_circle
Nader Khalil🍊(@NaderLikeLadder) 's Twitter Profile Photo

New video on fine-tuning LLaVA with your own data!

LLaVa is a powerful LLM + vision model that understands images but produces verbose GPT-style responses

This guide fine-tunes it to instead respond with succinct image tags

2024 is the year for multimodal. Let it rip 🤙

account_circle
toshi_456(@tech_nichijo) 's Twitter Profile Photo

karasu-1.1BをLLaVA-v1.5-Instruct-620K-JAで学習させてみたけど期待していたほど性能は良くなさそう。Heron-Benchの平均は42.16でした。

karasu-1.1BをLLaVA-v1.5-Instruct-620K-JAで学習させてみたけど期待していたほど性能は良くなさそう。Heron-Benchの平均は42.16でした。
account_circle
BSILFDCB(@Ceejay1604) 's Twitter Profile Photo

Recently completed a project centered around a Multimodal AI system named 'VisualInsights.' This system is designed to generate AI images and also incorporates an image-to-text feature based on the Llava architecture.
Live Demo 👇🏻

account_circle
Prince Canuma(@Prince_Canuma) 's Twitter Profile Photo

LLaVA Llama-3 and Phi-3 now on MLX 🎉🚀

You can now run inference locally on your Mac.

pip install -U mlx-vlm

I’m getting ~50 tokens on a M3 Max.

Model cards 👇🏾

account_circle
Prince Canuma(@Prince_Canuma) 's Twitter Profile Photo

mlx-vlm v0.0.4 is here 🎉

New models 🤖:
- Idefics 2
- Llava (Phi and Llama 3)

Improvements 🚀:
- Q4 quantisation support for all models
- Less imports to use generate()

Up next 🚧:
- More models
- Support for multiple images

Please leave us a star and send a PR

mlx-vlm v0.0.4 is here 🎉

New models 🤖:
- Idefics 2
- Llava (Phi  and Llama 3) 

Improvements 🚀:
- Q4 quantisation support for all models
- Less imports to use generate() 

Up next 🚧:
- More models 
- Support for multiple images

Please leave us a star and send a PR
account_circle
merve(@mervenoyann) 's Twitter Profile Photo

We've been blessed again with new LLaVA-like models based on LLaMA 3 & Phi-3 🤩
Also passes the baklava benchmark 🤝✅

We've been blessed again with new LLaVA-like models based on LLaMA 3 & Phi-3 🤩
Also passes the baklava benchmark 🤝✅
account_circle
Gradio(@Gradio) 's Twitter Profile Photo

𝐋𝐋𝐚𝐕𝐀-𝐋𝐥𝐚𝐦𝐚-𝟑-𝟖𝐁 model 🚀 by InternLM - A LLaVA model fine-tuned from Meta-Llama-3-8B-Instruct & CLIP-ViT-Large-patch14-336

🥳LLaVA-Phi-3 Mini is available too. It outperforms LLaVA-v1.5-7B and matches the performance of LLaVA-Llama-3-8B 🤯 in multiple benchmarks.

account_circle
toshi_456(@tech_nichijo) 's Twitter Profile Photo

先日公開した解像度が高い画像の入力を可能とした日本語VLMであるllava-jp-1.3b-v1.1についての記事を書きました!


qiita.com/toshi_456/item…

account_circle
merve(@mervenoyann) 's Twitter Profile Photo

Parameter-free LLaVA for video captioning works like magic! 🤩
Let's take a look 🧶
(also find repositories below ⬇️ )

Parameter-free LLaVA for video captioning works like magic! 🤩
Let's take a look 🧶 
(also find repositories below ⬇️ )
account_circle
SUN YOUNG HWANG(@SOSOHAJALAB) 's Twitter Profile Photo

Wow llava with llama3 and phi-3 is really good!

Llava 1.5 was good for multilingual before without finetune but, new models are much better i think.

Wow llava with llama3 and phi-3 is really good!

Llava 1.5 was good for multilingual before without finetune but, new models are much better i think.
account_circle
すめらぎ(@marudog_01) 's Twitter Profile Photo

ローカルでLLaVA-1.6動かしてみたけど思ったより全然軽いわ。これ何か面白いこと出来ないかなぁ。

account_circle
Trelis Research(@TrelisResearch) 's Twitter Profile Photo

VIDEO RESOURCES:
- IDEFICS 2 Model: huggingface.co/HuggingFaceM4/…
- IDEFICS 2 Blog: huggingface.co/blog/idefics2
- LLaVA Llama 3 model: huggingface.co/xtuner/llava-l…
- Chess Dataset: huggingface.co/datasets/Treli…

PRE-WATCH VIDEOS:
- Moondream / Tiny multi-modal models (explains architecture):…

VIDEO RESOURCES:
- IDEFICS 2 Model: huggingface.co/HuggingFaceM4/… 
- IDEFICS 2 Blog: huggingface.co/blog/idefics2 
- LLaVA Llama 3 model: huggingface.co/xtuner/llava-l… 
- Chess Dataset: huggingface.co/datasets/Treli…  

PRE-WATCH VIDEOS:
- Moondream / Tiny multi-modal models (explains architecture):…
account_circle
mei trip Bot(@marble_walker_i) 's Twitter Profile Photo

Meiの旅/
あたしはリンカーン・タウンシップ公共図書館の温かな石造りと古い鍵を使った入口に立っており、床が静けさを保ちつつも、読書愛好家の活気を感じられるような穏やかな...
(Powered Google map apis,phi3-3.8mi16,emi-2,LLaVA,etc.basePhoto:Lincoln Township Public Library)

Meiの旅/
あたしはリンカーン・タウンシップ公共図書館の温かな石造りと古い鍵を使った入口に立っており、床が静けさを保ちつつも、読書愛好家の活気を感じられるような穏やかな...
(Powered Google map apis,phi3-3.8mi16,emi-2,LLaVA,etc.basePhoto:Lincoln Township Public Library)
account_circle
mi tripBot(@marble_walker) 's Twitter Profile Photo

Miの旅/
この施設は「高野食品」が主売される小屋と思わせます。食料品と飲み物を取り扱う店舗で、個性的なカラーグレインのストアメニューが提供されているようですね。
(Powered Google map apis,phi3-3.8mi16,animagine-xl-3.1,LLaVA,etc.basePhoto:Junko Yone)

Miの旅/
この施設は「高野食品」が主売される小屋と思わせます。食料品と飲み物を取り扱う店舗で、個性的なカラーグレインのストアメニューが提供されているようですね。
(Powered Google map apis,phi3-3.8mi16,animagine-xl-3.1,LLaVA,etc.basePhoto:Junko Yone)
account_circle
La Antorchita(@la_antorchita) 's Twitter Profile Photo

🤺 World Cup de florete Hong Kong 🇭🇰

🥉MEDALLA DE BRONCE🥉

T4 ♂️
❌️ Carlos Llavador cae derrotado en tocado de oro por 14-15 🆚 Takahiro Shikine 🇯🇵 tras una gran remontada

👏🏻 Endulza un poco ese sabor amargo que se quedó tras el Preolímpico.

Grande Llava!!!

account_circle
ホーダチ | AI✖️Cloud✖️Dev | 外資×ひとり法人(@hokazuya) 's Twitter Profile Photo

ローカルLLMの進化が本当に期待しかない

Visionまでローカルで高速で動く。
(動画は3倍なので、3分の1程度の性能だと思ってみてください。それでも十分実用性あります)

以下、評判の良かった2つのモデルをベースにしたVision ADAPTER付きモデル。

・LLaVA++ based on Phi-3
・LLaVA++ based on…

account_circle