Pritish Mishra (@pritishllm) 's Twitter Profile
Pritish Mishra

@pritishllm

ml engineer smallest.ai | working on LLMs, fine-tuning, multimodality and real-time voice agents.

ID: 1155845384612093953

calendar_today29-07-2019 14:19:49

879 Tweet

278 Takipçi

957 Takip Edilen

Pritish Mishra (@pritishllm) 's Twitter Profile Photo

Incredible release by NVIDIA. This model will be invaluable for latency-sensitive applications. It's small, fast, strong and fits on a single GPU, what more to ask for?

Pritish Mishra (@pritishllm) 's Twitter Profile Photo

Me: Okay, it’s time for Gemma-4… Google: Introducing T5Gemma-2 🥳 Me: Alright then, Gemma-4 next… Google: Here’s Gemma Scope-2 Me: Definitely Gemma-4 this time… Google: FunctionGemma. Not to misunderstand, these are all absolute banger releases. But I’ll be patiently

Pritish Mishra (@pritishllm) 's Twitter Profile Photo

why even the SoTA closed-source model does this? genuinely curious to know, is this quantization or some inference-level kernel bug..

why even the SoTA closed-source model does this?

genuinely curious to know, is this quantization or some inference-level kernel bug..
Pritish Mishra (@pritishllm) 's Twitter Profile Photo

I shifted from MoE to Dense models and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved.

Pritish Mishra (@pritishllm) 's Twitter Profile Photo

the exact same Hindi sentence tokenized by different models Qwen3: 222 tokens GLM-4.7: 212 tokens Nemotron: 84 tokens Gemma: 66 tokens GLM and Qwen3 are very strong models, but their tokenizers are dominantly trained on English text. As a result, they end up tokenizing almost

the exact same Hindi sentence tokenized by different models

Qwen3: 222 tokens
GLM-4.7: 212 tokens
Nemotron: 84 tokens
Gemma: 66 tokens

GLM and Qwen3 are very strong models, but their tokenizers are dominantly trained on English text. As a result, they end up tokenizing almost
Sudarshan Kamath (@kamath_sutra) 's Twitter Profile Photo

Announcing... Voice x Memory! We’re unpacking what makes agents listen, respond, and remember, or sometimes forget, and what that means for building better voice systems. We will move towards a world of large LLMs remembering a lot of information to smaller LMs with finite