Mitko Vasilev (@iotcoi) 's Twitter Profile
Mitko Vasilev

@iotcoi

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

ID: 1650982598

calendar_today06-08-2013 18:50:28

5,5K Tweet

1,1K Followers

1,1K Following

Mitko Vasilev (@iotcoi) 's Twitter Profile Photo

I run Claude Code with Qwen3 Coder Flash locally on my MacBook Air. It works offline, zero cloud, zero internet. No limit with all tokens on the house. Not great, not terrible- adequate performance for an on device AI agent chewing through code on a 1.24 kg laptop.

Mitko Vasilev (@iotcoi) 's Twitter Profile Photo

Say hello to OpenAI GPT OSS, almost Apache 2.0 licensed, reasoning-focused, agentic beast models in two flavors: gpt-oss-120b with 117B params, but only 3.6B active via MoE gpt-oss-20b 21B params with 5.1B active. Runs on `transformers`, `vLLM`, `llama.cpp`, or `ollama`.

Say hello to OpenAI GPT OSS, almost Apache 2.0 licensed, reasoning-focused, agentic beast models in two flavors: 
gpt-oss-120b with 117B params, but only 3.6B active via MoE 
gpt-oss-20b 21B params with 5.1B active.
Runs on `transformers`, `vLLM`, `llama.cpp`, or `ollama`.
Teknium (e/λ) (@teknium1) 's Twitter Profile Photo

Ok I think we will move hermes tool calling format format over to pure xml over json now that vllm and sglang support that parser. Unfortunately too late for hermes 4. But eventually. Json is suboptimal for tool calls with code and long outputs that need escape sequences, xml

Georgi Gerganov (@ggerganov) 's Twitter Profile Photo

LMStudio are using the upstream ggml implementation which is significantly better and well optimized. Looking at ollama's modifications in ggml, they have too much branching in their MXFP4 kernels and the attention sinks implementation is really inefficient. Along with other

antirez bsky social (@antirez) 's Twitter Profile Photo

Prediction: the OpenAI release for GPT OSS 120/20 will weaken the position of US AI OSS in comparison to the Chinese movement and results. And this is happening after the Llama4 fiasco: multiplicative effects.

the tiny corp (@__tinygrad__) 's Twitter Profile Photo

GPUs provide tons of profiling information at the assembly level. A lesson from hacking: dynamic program analysis > static program analysis.

GPUs provide tons of profiling information at the assembly level. A lesson from hacking: dynamic program analysis > static program analysis.
Awni Hannun (@awnihannun) 's Twitter Profile Photo

Super intelligence in your pocket will happen. And if you want to accelerate the timeline hit me up. We’re working on this at every layer with MLX.

cloud (@cloud11665) 's Twitter Profile Photo

With the recent release of GPT-OSS bringing mxfp4 quantization into the mainstream, I've decided to experiment with optimizing it on the CPU using AVX512 and got a 44x speedup over the standard way of computing an fp4 dot product (1/4)

With the recent release of GPT-OSS bringing mxfp4 quantization into the mainstream, I've decided to experiment with optimizing it on the CPU using AVX512 and got a 44x speedup over the standard way of computing an fp4 dot product (1/4)
Mitko Vasilev (@iotcoi) 's Twitter Profile Photo

Earlier today, humanity faced a critical threat from a catastrophic chart crime. I asked my local Qwen3 Coder Flash to fix it. Sleep well, fellow humans. The visualization singularity is now high, and it runs with zero warnings.

Earlier today, humanity faced a critical threat from a catastrophic chart crime. I asked my local Qwen3 Coder Flash to fix it. Sleep well, fellow humans. The visualization singularity is now high, and it runs with zero warnings.
Mitko Vasilev (@iotcoi) 's Twitter Profile Photo

Downloading AI models from GitHub instead of HuggingFace. Use Azure AI instead of the HF inference providers. 2025 o tempora, o mores!

Shojaei (@realshojaei) 's Twitter Profile Photo

I am uninstalling ollama for good!! I ran a simple eval (~360 question) with exact same gguf file one time with ollama and one time with LM Studio Look at the score and latency: (first one is lmstudio) Also with ollama I experience higher vram usage

I am uninstalling <a href="/ollama/">ollama</a> for good!!
I ran a simple eval (~360 question) with exact same gguf file
one time with ollama and one time with <a href="/lmstudio/">LM Studio</a> 
Look at the score and latency: (first one is lmstudio)
Also with ollama I experience higher vram usage