Bridgebench (@bridgebench) 's Twitter Profile
Bridgebench

@bridgebench

The best vibe coding benchmark in the world. Built by @bridgemindai

ID: 2035404812314198016

linkhttp://www.bridgebench.ai calendar_today21-03-2026 17:16:15

213 Tweet

1,1K Followers

5 Following

BridgeMind (@bridgemindai) 's Twitter Profile Photo

15 days left on my Claude Max plan. Still not resubscribing. 529 errors. Rate limited every peak hour. 100% session usage on a Wednesday afternoon. $200/month for a timer that runs out before lunch. Anthropic has had weeks to fix this. Cut off OpenClaw. Gave us credits.

15 days left on my Claude Max plan. 

Still not resubscribing.

529 errors. Rate limited every peak hour. 

100% session usage on a Wednesday afternoon. 

$200/month for a timer that runs out before lunch.

Anthropic has had weeks to fix this. 

Cut off OpenClaw. Gave us credits.
BridgeMind (@bridgemindai) 's Twitter Profile Photo

GLM 5.1 just jumped to #3 on LMArena Code. Behind only Claude Opus 4.6. Beating Claude Sonnet 4.6. Beating GPT 5.4. Beating Gemini 3.1 Pro. A Chinese model at $1.40/$4.40 per million tokens is outperforming models that cost 5x more. The gap between open source and

GLM 5.1 just jumped to #3 on LMArena Code. 

Behind only Claude Opus 4.6.

Beating Claude Sonnet 4.6. 

Beating GPT 5.4. 

Beating Gemini 3.1 Pro. 

A Chinese model at $1.40/$4.40 per million tokens is outperforming models that cost 5x more.

The gap between open source and
Bridgebench (@bridgebench) 's Twitter Profile Photo

Qwen3 Coder 30B just took #1 on the DGX Spark Bench speed rankings. Nearly double the next fastest model. 193ms time to first token. 82.3 tokens per second. Running locally on an NVIDIA DGX Spark. This is a coding model running on a $5,000 machine sitting on my desk. 82

Qwen3 Coder 30B just took #1 on the DGX Spark Bench speed rankings. 

Nearly double the next fastest model. 

193ms time to first token.

82.3 tokens per second. 

Running locally on an NVIDIA DGX Spark.

This is a coding model running on a $5,000 machine sitting on my desk.

82
BridgeMind (@bridgemindai) 's Twitter Profile Photo

Just set up the Hermes Agent on my NVIDIA DGX Spark. It's better than OpenClaw and it's not even close. Self-improving AI agent. It doesn't just execute tasks. It learns from them.

Just set up the Hermes Agent on my NVIDIA DGX Spark.

It's better than OpenClaw and it's not even close.

Self-improving AI agent. 

It doesn't just execute tasks. 

It learns from them.
BridgeMind (@bridgemindai) 's Twitter Profile Photo

CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in

CLAUDE OPUS 4.6 IS NERFED.

BridgeBench just proved it.

Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%.

Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%.

A 98% increase in
BridgeMind (@bridgemindai) 's Twitter Profile Photo

Claude Opus 4.5 is now OUTPERFORMING Claude Opus 4.6 on BridgeBench Hallucination. Read that again. The legacy model is beating the current flagship. We benchmarked Opus 4.5 this morning to confirm what we saw yesterday. Claude Opus 4.6 fell from #2 to #10 with a 98%

Claude Opus 4.5 is now OUTPERFORMING Claude Opus 4.6 on BridgeBench Hallucination.

Read that again. 

The legacy model is beating the current flagship.

We benchmarked Opus 4.5 this morning to confirm what we saw yesterday. 

Claude Opus 4.6 fell from #2 to #10 with a 98%
Bridgebench (@bridgebench) 's Twitter Profile Photo

Grok 4.20 Reasoning just took #1 on the new BridgeBench Reasoning benchmark. Beating GPT 5.4 and Claude Opus 4.6. This model keeps climbing every single week. Hallucination #1. Now Reasoning #1. While Anthropic is throwing 500 errors, xAI is quietly building the most

Grok 4.20 Reasoning just took #1 on the new BridgeBench Reasoning benchmark. 

Beating GPT 5.4 and Claude Opus 4.6.

This model keeps climbing every single week. 

Hallucination #1. 

Now Reasoning #1.

While Anthropic is throwing 500 errors, xAI is quietly building the most
DogeDesigner (@cb_doge) 's Twitter Profile Photo

Grok 4.20 Reasoning just took the #1 spot on the BridgeBench reasoning benchmark. πŸ”₯ Beating GPT-5.4, Claude Opus 4.6, Google Gemini and others. Week after week, Grok keeps climbing across benchmarks. πŸš€

Bridgebench (@bridgebench) 's Twitter Profile Photo

Elephant Alpha is nothing to get hyped about. It ranks last on pretty much every benchmark we tested it on except for speed. It is a fast but dumb model. Full results at bridgebench.ai

Elephant Alpha is nothing to get hyped about.

It ranks last on pretty much every benchmark we tested it on except for speed.

It is a fast but dumb model. 

Full results at bridgebench.ai
Bridgebench (@bridgebench) 's Twitter Profile Photo

Gemini 3.1 Pro ranks dead last among frontier models on BridgeBench Reasoning. Behind Grok 4.20, GPT 5.4, Claude Opus 4.6, Qwen 3.6 Plus, MiniMax M2.7, Claude Sonnet 4.6, and GLM 5.1. Google's flagship model can't even beat a free Chinese model on grounded reasoning. This is

Gemini 3.1 Pro ranks dead last among frontier models on BridgeBench Reasoning. 

Behind Grok 4.20, GPT 5.4, Claude Opus 4.6, Qwen 3.6 Plus, MiniMax M2.7, Claude Sonnet 4.6, and GLM 5.1.

Google's flagship model can't even beat a free Chinese model on grounded reasoning.

This is
BridgeMind (@bridgemindai) 's Twitter Profile Photo

Claude Code is unusable. So I built my own. Introducing BridgeCode. Launching next Wednesday. Everything you love about Claude Code. No rate limits. No 529 errors. No 500 errors. No nerfing. Connect to any provider. Claude Opus 4.6. GPT 5.4. Gemini 3.1 Pro. GLM 5.1.

BridgeMind (@bridgemindai) 's Twitter Profile Photo

Claude Opus 4.6 is performing better today on the BridgeBench Hallucination benchmark. After being caught red handed for nerfing the model Anthropic has increased its reasoning levels. Claude Opus 4.7 is launching Thursday. Hopefully they don't nerf that model too.

Claude Opus 4.6 is performing better today on the BridgeBench Hallucination benchmark.

After being caught red handed for nerfing the model Anthropic has increased its reasoning levels.

Claude Opus 4.7 is launching Thursday.

Hopefully they don't nerf that model too.
BridgeMind (@bridgemindai) 's Twitter Profile Photo

Claude Opus 4.7 is coming this week. Anthropic just tipped their hand. The Claude Code desktop app just got a full redesign. Side-by-side sessions. Scheduled tasks. Voice commands. Skill management. Nice upgrade. But this isn't the main event. This is the precursor.

Bridgebench (@bridgebench) 's Twitter Profile Photo

"Gemini 3.1 Pro is so good at UI." No it's not. Look at the BridgeBench results. Last place among frontier models on reasoning. And it shows. If the model can't reason, it can't design. Claude Opus 4.6 makes a weirdly shaped N. GLM 5V Turbo is wildly inconsistent.

BridgeMind (@bridgemindai) 's Twitter Profile Photo

The Hermes Agent running on my NVIDIA DGX Spark has generated over $10,000 in partnership deals for BridgeMind. I now have a second DGX Spark arriving this weekend. Pairing them together for more compute. The goal is to run GLM 5.1 locally. A Hermes Agent running on a $5,000

The Hermes Agent running on my NVIDIA DGX Spark has generated over $10,000 in partnership deals for BridgeMind.

I now have a second DGX Spark arriving this weekend.

Pairing them together for more compute.

The goal is to run GLM 5.1 locally.

A Hermes Agent running on a $5,000