Maximilian Bode (@mxpbode) 's Twitter Profile
Maximilian Bode

@mxpbode

Associate Partner @tngtech

ID: 1011857376062910464

linkhttps://maximilianbo.de calendar_today27-06-2018 06:22:52

52 Tweet

81 Followers

116 Following

TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

How DeepSeek-R1 thinks: Activation patterns flowing through its Mixture-of-Experts layers while processing the token stream "What happened last year at Berlin Wall?". The Matrix...

TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

DeepSeek uploaded a new model on huggingface: DeepSeek-Prover-V2 It seems, the architecture is identical to V3 and R1 models, because: model_config.py shows no difference, also the safetensor index files are the same. One minor diff is a new experimental feature in

DeepSeek uploaded a new model on huggingface: DeepSeek-Prover-V2

It seems, the architecture is identical to V3 and R1 models, because:

model_config.py shows no difference, also the safetensor index files are the same.

One minor diff is a new experimental feature in
kalomaze (@kalomaze) 's Twitter Profile Photo

this is kind of absurd, and totally flew under my radar you can selectively blacklist ~0.07% of DeepSeek r1's experts which are most associated with refusing this makes the model... better, according to Mt-Bench, most prominently for coding (tho fyi, mt-Bench is a bit dated)

this is kind of absurd, and totally flew under my radar
you can selectively blacklist ~0.07% of DeepSeek r1's experts which are most associated with refusing
this makes the model... better, according to Mt-Bench, most prominently for coding
(tho fyi, mt-Bench is a bit dated)
Maximilian Bode (@mxpbode) 's Twitter Profile Photo

Having a blast in beautiful Estes Park, CO. It was a real pleasure sharing my perspective on how LLMs are transforming software engineering, and why running your own GPU stack can be a game-changer. Thanks to everyone who came with smart questions and big ideas! #LambdaConf2025

TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

More evidence for the effectiveness of the Chimera construction method: Taking DeepSeek's R1-0528 release, we started benchmarking new Chimera variants on AIME-24 and SimpleQA. R1-0528 significantly improves AIME performance from 79.8 to 91.4 while doubling the amount of output

More evidence for the effectiveness of the Chimera construction method:

Taking DeepSeek's R1-0528 release, we started benchmarking new Chimera variants on AIME-24 and SimpleQA.

R1-0528 significantly improves AIME performance from 79.8 to 91.4 while doubling the amount of output
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Assembly of Experts: Our linear-time 671B Chimera LLM construction paper should soon appear on arxiv.org. We are at the AI Engineer fair in SFO until tomorrow, so for a chat -> DM ;-)

Assembly of Experts: Our linear-time 671B Chimera LLM construction paper should soon appear on arxiv.org.
We are at the <a href="/aiDotEngineer/">AI Engineer</a> fair in SFO until tomorrow, so for a chat -&gt; DM ;-)
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

We post our new paper "Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors" on Hugging Face, while waiting for arXiv.org. We explain how we constructed the 671B R1T Chimera child model from the great DeepSeek V3-0324

We post our new paper "Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors" on <a href="/huggingface/">Hugging Face</a>, while waiting for <a href="/arxiv/">arXiv.org</a>. 
We explain how we constructed the 671B R1T Chimera child model from the great <a href="/deepseek_ai/">DeepSeek</a> V3-0324
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Today we release DeepSeek-TNG R1T2 Chimera. This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324. R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be... * about 20% faster than R1, and

Today we release DeepSeek-TNG R1T2 Chimera.

This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents,  namely R1-0528, R1 and V3-0324.

R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be...

* about 20% faster than R1, and
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

DeepSeek-TNG-R1T2-Chimera is currently the #1 trending model on OpenRouter The platform which Andrej Karpathy called the "transfer switch of AI". Our Assembly-of-Experts method pushes the Pareto frontier between model intelligence and inference cost. Thanks again to the Open

DeepSeek-TNG-R1T2-Chimera is currently the #1 trending model on <a href="/OpenRouterAI/">OpenRouter</a> 

The platform which <a href="/karpathy/">Andrej Karpathy</a> called the "transfer switch of AI".

Our Assembly-of-Experts method pushes the Pareto frontier between model intelligence and inference cost.

Thanks again to the Open
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

FAZ article: "Der Boxkampf der KI-Tiger" with assessments from TNG A recent article in the Frankfurter Allgemeine covers the World Artificial Intelligence Conference in Shanghai, providing insights into the new and significant Chinese open source #AI models such as Qwen3, K2, and DeepSeek. TNG

FAZ article: "Der Boxkampf der KI-Tiger" with assessments from TNG

A recent article in the <a href="/faznet/">Frankfurter Allgemeine</a> covers the World Artificial Intelligence Conference in Shanghai, providing insights into the new and significant Chinese open source #AI models such as Qwen3, K2, and DeepSeek. TNG
TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

News from the Aider discord regarding DeepSeek-TNG R1T2 Chimera's performance in the Aider Polyglot benchmark, courtesy of benchmark wizard neolithic5452 and the magic Unsloth AI quantizations: - 2 bit UD-IQ2_M: 60.0% - 4 bit Q4_K_XL: 62.7% - 8 bit: 64.4% This seems to be the