Sudo su (@sudoingx) Twitter Tweets • TwiCopy

Sudo su

@sudoingx

+ Follow

Solo dev founder crafting fast ML/AI compute platform & Style Assistant. Sharing code snippets, founder lessons, & build-in-public wins 🔥

ID: 1555661341914198016

calendar_today05-08-2022 21:05:57

185 Tweet

83 Takipçi

153 Takip Edilen

Sudo su

@sudoingx

a month ago

i've been wanting to run this comparison for weeks. dense vs MoE. same param count. same GPU. completely different architecture. here's what caught my eye. hermes 4.3. 36B dense. 93.8% on MATH-500. 512K context. every single parameter active on every forward pass. no routing.

thumb_up_off_alt153

chat_bubble_outline12

repeat2

shareShare

Sudo su

@sudoingx

a month ago

qwen just dropped 4 new models. 0.8B runs on a phone. 9B runs on 6GB RAM. same Qwen3.5 family, same 256K context, all the way down. i just finished benchmarking the 35B MoE across 15 GPUs and published the full breakdown. now the entire family is here. 0.8B, 2B, 4B, 9B.

thumb_up_off_alt168

chat_bubble_outline4

repeat5

shareShare

Sudo su

@sudoingx

a month ago

124 tok/s on vLLM with AWQ 4-bit. beating the llama.cpp 112 tok/s number on the same RTX 3090. Qwen3.5-35B-A3B. different engine, different quant, fp8 KV cache instead of q8_0. same GPU. same model. haven't tested vLLM myself yet but it's on the list. if anyone else can verify

thumb_up_off_alt92

chat_bubble_outline5

repeat3

shareShare

Sudo su

@sudoingx

a month ago

testing hermes 4.3 36B through opencode harness. 24GB VRAM. 22K usable context. within 10 tool calls it hit the wall. the model is 21.8GB at Q4_K_M. on 24GB that forces 32K context for usable speed. but the agent eats 10K tokens for system prompt and tool definitions. leaves 22K

thumb_up_off_alt53

chat_bubble_outline11

repeat5

shareShare

Sudo su

@sudoingx

a month ago

hey if you're thinking about running hermes 4.3 36B as a coding agent on a single RTX 3090, let me save you 24 minutes. the model is 21.8GB at Q4_K_M. on 24GB VRAM that leaves room for 32K context. sounds workable until the agent eats 10K tokens for system prompt and tools. you

thumb_up_off_alt131

chat_bubble_outline11

repeat7

shareShare

Sudo su

@sudoingx

a month ago

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare