Mitko Vasilev (@iotcoi) Twitter Tweets • TwiCopy

Chubby♨️

@kimmonismus

4 months ago

And here we go again: a finetuned DeepSeek scientific model allegedly achieved 40,44% on HLE

thumb_up_off_alt674

chat_bubble_outline13

repeat46

shareShare

I run Claude Code with Qwen3 Coder Flash locally on my MacBook Air. It works offline, zero cloud, zero internet. No limit with all tokens on the house. Not great, not terrible- adequate performance for an on device AI agent chewing through code on a 1.24 kg laptop.

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat84

shareShare

Mitko Vasilev

@iotcoi

4 months ago

GPT-OSS llama.cpp implementation github.com/ggml-org/llama…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Mitko Vasilev

@iotcoi

4 months ago

Say hello to OpenAI GPT OSS, almost Apache 2.0 licensed, reasoning-focused, agentic beast models in two flavors: gpt-oss-120b with 117B params, but only 3.6B active via MoE gpt-oss-20b 21B params with 5.1B active. Runs on `transformers`, `vLLM`, `llama.cpp`, or `ollama`.

thumb_up_off_alt14

chat_bubble_outline1

repeat3

shareShare

Teknium (e/λ)

@teknium1

4 months ago

Ok I think we will move hermes tool calling format format over to pure xml over json now that vllm and sglang support that parser. Unfortunately too late for hermes 4. But eventually. Json is suboptimal for tool calls with code and long outputs that need escape sequences, xml

thumb_up_off_alt148

chat_bubble_outline8

repeat6

shareShare

Georgi Gerganov

@ggerganov

4 months ago

LMStudio are using the upstream ggml implementation which is significantly better and well optimized. Looking at ollama's modifications in ggml, they have too much branching in their MXFP4 kernels and the attention sinks implementation is really inefficient. Along with other

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat106

shareShare

antirez bsky social

@antirez

4 months ago

Prediction: the OpenAI release for GPT OSS 120/20 will weaken the position of US AI OSS in comparison to the Chinese movement and results. And this is happening after the Llama4 fiasco: multiplicative effects.

thumb_up_off_alt56

chat_bubble_outline8

repeat5

shareShare

the tiny corp

@__tinygrad__

4 months ago

GPUs provide tons of profiling information at the assembly level. A lesson from hacking: dynamic program analysis > static program analysis.

thumb_up_off_alt226

chat_bubble_outline4

repeat14

shareShare

Awni Hannun

@awnihannun

4 months ago

Super intelligence in your pocket will happen. And if you want to accelerate the timeline hit me up. We’re working on this at every layer with MLX.

thumb_up_off_alt520

chat_bubble_outline20

repeat24

shareShare

Zach Mueller

@thezachmueller

4 months ago

Alright fine

thumb_up_off_alt42

chat_bubble_outline3

repeat3

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

4 months ago

local Bulgarian man causally roasts a scammy SV startup btw have I told you that I hate ollama? This is *also* why

thumb_up_off_alt224

chat_bubble_outline9

repeat10

shareShare

Mitko Vasilev

@iotcoi

4 months ago

I don’t know if gpt-oss is more phi or more llama4

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

cloud

@cloud11665

4 months ago

With the recent release of GPT-OSS bringing mxfp4 quantization into the mainstream, I've decided to experiment with optimizing it on the CPU using AVX512 and got a 44x speedup over the standard way of computing an fp4 dot product (1/4)

thumb_up_off_alt101

chat_bubble_outline4

repeat9

shareShare

Mitko Vasilev

@iotcoi

4 months ago

Earlier today, humanity faced a critical threat from a catastrophic chart crime. I asked my local Qwen3 Coder Flash to fix it. Sleep well, fellow humans. The visualization singularity is now high, and it runs with zero warnings.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Mitko Vasilev

@iotcoi

4 months ago

me opening the model selector today

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Mitko Vasilev

@iotcoi

4 months ago

Random number generator lol iq-checker.xyz/iotcoi

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Axel Dittmann

@dittmannaxel

4 months ago

Loool - I won :-) :-)

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Mitko Vasilev

@iotcoi

4 months ago

Downloading AI models from GitHub instead of HuggingFace. Use Azure AI instead of the HF inference providers. 2025 o tempora, o mores!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Unitree

@unitreerobotics

4 months ago

First World Humanoid Robot Games See You Tomorrow, the 15th Everyone, guess how fast is it? 🥳

thumb_up_off_alt784

chat_bubble_outline50

repeat148

shareShare

Shojaei

@realshojaei

4 months ago

I am uninstalling ollama for good!! I ran a simple eval (~360 question) with exact same gguf file one time with ollama and one time with LM Studio Look at the score and latency: (first one is lmstudio) Also with ollama I experience higher vram usage

I am uninstalling <a href="/ollama/">ollama</a> for good!!
I ran a simple eval (~360 question) with exact same gguf file
one time with ollama and one time with <a href="/lmstudio/">LM Studio</a>
Look at the score and latency: (first one is lmstudio)
Also with ollama I experience higher vram usage

thumb_up_off_alt181

chat_bubble_outline21

repeat7

shareShare