Prithiv Sakthi (@prithiv_003) 's Twitter Profile
Prithiv Sakthi

@prithiv_003

computer vision | multimodal ai | treating llm ִ ࣪𖤐

ID: 1578380425365401602

linkhttp://hf.co/prithivmlmods calendar_today07-10-2022 13:43:22

6,6K Tweet

232 Followers

745 Following

Prithiv Sakthi (@prithiv_003) 's Twitter Profile Photo

Made Nano Banana to transform freestyle drawings into image illustrations (from free-style drawing to image). It also supports other options like image generation and single or multiple image edits. Built with the Gemini API, powered by GCP. App: …no-banana-aio-op72ohwdda-uw.a.run.app

Made <a href="/NanoBanana/">Nano Banana</a>  to transform freestyle drawings into image illustrations (from free-style drawing to image). It also supports other options like image generation and single or multiple image edits. Built with the Gemini API, powered by GCP.

App: …no-banana-aio-op72ohwdda-uw.a.run.app
Vivek Galatage (@vivekgalatage) 's Twitter Profile Photo

Understanding GPU Architecture from Cornell cvw.cac.cornell.edu/gpu-architectu… During a low-level discussion at a casual meetup, many folks were interested in understanding GPUs more closely. While CPUs optimize for complex control flow (see those big cores + caches), the GPUs maximize

Understanding GPU Architecture from Cornell 

cvw.cac.cornell.edu/gpu-architectu…

During a low-level discussion at a casual meetup, many folks were interested in understanding GPUs more closely.

While CPUs optimize for complex control flow (see those big cores + caches), the GPUs maximize
Anand Bhattad (@anand_bhattad) 's Twitter Profile Photo

This is cool! We’ve been building something along similar lines in academia: Generative Blocks World. No magic here—just an intuitive pipeline grounded in decades of work on primitive decomposition in computer vision. The inspiration goes all the way back to one of the earliest

This is cool!

We’ve been building something along similar lines in academia: Generative Blocks World.

No magic here—just an intuitive pipeline grounded in decades of work on primitive decomposition in computer vision. The inspiration goes all the way back to one of the earliest
AiBattle (@aibattle_) 's Twitter Profile Photo

Another new Google Gemini model "Oceanreef" is being tested in LmArena The model is likely related to the "Oceanstone" model, which appeared 2 days ago

Another new Google Gemini model "Oceanreef" is being tested in LmArena

The model is likely related to the "Oceanstone" model, which appeared 2 days ago
Vivek Galatage (@vivekgalatage) 's Twitter Profile Photo

While researching GPU architecture further, I found Kostas Anagnostou's recent blog post, "GPU utilisation and performance improvements". Quite interesting insights on GPU perf, read on! interplayoflight.wordpress.com/2025/08/29/gpu…

While researching GPU architecture further, I found Kostas Anagnostou's recent blog post, "GPU utilisation and performance improvements".

Quite interesting insights on GPU perf, read on!

interplayoflight.wordpress.com/2025/08/29/gpu…
merve (@mervenoyann) 's Twitter Profile Photo

I love MiniCPM-V 4.5, it's underrated it's only 8B yet great in factual correction + thinking 💬 as they claim, gpt-4o level VLM on-device 👏 great work OpenBMB

I love MiniCPM-V 4.5, it's underrated 

it's only 8B yet great in factual correction + thinking 💬

as they claim, gpt-4o level VLM on-device 👏 great work <a href="/OpenBMB/">OpenBMB</a>
merve (@mervenoyann) 's Twitter Profile Photo

IBM just released small swiss army knife for the document models: granite-docling-258M 🔥 not only a document converter but also can do document question answering, understand multiple languages 🤯 with Apache 2.0 license 👏

IBM just released small swiss army knife for the document models: granite-docling-258M 🔥

not only a document converter but also can do document question answering, understand multiple languages 🤯 

with Apache 2.0 license 👏
Ant Ling (@antling20041208) 's Twitter Profile Photo

⚡️Ling-flash-2.0⚡️ is now open source. 100B MoE LLM • only 6.1B active params --> 3x faster than 36B dense (200+ tok/s on H20) --> Beats ~40B dense LLM on complex reasoning --> Powerful coding and frontend development Small activation. Big performance.

⚡️Ling-flash-2.0⚡️ is now open source.
100B MoE LLM • only 6.1B active params
--&gt; 3x faster than 36B dense (200+ tok/s on H20)
--&gt; Beats ~40B dense LLM on complex reasoning
--&gt; Powerful coding and frontend development
Small activation. Big performance.
Draw Things (@drawthingsapp) 's Twitter Profile Photo

1. This LoRA is called Qwen-Image-HeadshotX (source link below). It provides precise portrait rendering with a strong focus on realism.👇🏻 huggingface.co/prithivMLmods/…

1. This LoRA is called Qwen-Image-HeadshotX (source link below). It provides precise portrait rendering with a strong focus on realism.👇🏻
huggingface.co/prithivMLmods/…
Vivek Galatage (@vivekgalatage) 's Twitter Profile Photo

This has to be one of the best GPU programming resources I've found - the GPU Glossary from Modal breaks down complex concepts with clear visuals and explanations, from CUDA architecture to Tensor Cores to CTAs. modal.com/gpu-glossary

This has to be one of the best GPU programming resources I've found - the GPU Glossary from Modal breaks down complex concepts with clear visuals and explanations, from CUDA architecture to Tensor Cores to CTAs.

modal.com/gpu-glossary
DailyPapers (@huggingpapers) 's Twitter Profile Photo

GenExam: The first multidisciplinary text-to-image exam is now on Hugging Face This new benchmark challenges T2I models with 1,000 rigorous, exam-style prompts across 10 subjects. It comes with ground-truth images and detailed scoring for semantic correctness and visual

GenExam: The first multidisciplinary text-to-image exam is now on Hugging Face

This new benchmark challenges T2I models with 1,000 rigorous, exam-style prompts across 10 subjects. It comes with ground-truth images and detailed scoring for semantic correctness and visual
DailyPapers (@huggingpapers) 's Twitter Profile Photo

ByteDance unveils SAIL-VL2, a SOTA vision-language foundation model. It achieves comprehensive multimodal understanding and reasoning, outperforming at 2B & 8B scales.

ByteDance unveils SAIL-VL2, a SOTA vision-language foundation model.

It achieves comprehensive multimodal understanding and reasoning, outperforming at 2B &amp; 8B scales.
Prithiv Sakthi (@prithiv_003) 's Twitter Profile Photo

After a while, I’ve migrated the app’s tech stack to make it compatible for deployment on HF Spaces. Nano Banana AIO (Wrapper) is used for manipulating free-style drawings into images, (multi-image editing, image generation, etc.). 🍌🤗 Space: huggingface.co/spaces/prithiv…

After a while, I’ve migrated the app’s tech stack to make it compatible for deployment on HF Spaces. Nano Banana AIO (Wrapper) is used for manipulating free-style drawings into images, (multi-image editing, image generation, etc.).  🍌🤗

Space: huggingface.co/spaces/prithiv…
DailyPapers (@huggingpapers) 's Twitter Profile Photo

A new era for 360-degree vision in AI, co-authored by Insta360! PANORAMA introduces a revolutionary architecture for omnidirectional vision in embodied AI, offering holistic environmental awareness. It addresses key challenges in data, models, and applications.

A new era for 360-degree vision in AI, co-authored by Insta360!

PANORAMA introduces a revolutionary architecture for omnidirectional vision in embodied AI, offering holistic environmental awareness. It addresses key challenges in data, models, and applications.