Bridgebench (@bridgebench) Twitter Tweets • TwiCopy

BridgeMind

14 days ago

15 days left on my Claude Max plan. Still not resubscribing. 529 errors. Rate limited every peak hour. 100% session usage on a Wednesday afternoon. $200/month for a timer that runs out before lunch. Anthropic has had weeks to fix this. Cut off OpenClaw. Gave us credits.

thumb_up_off_alt529

chat_bubble_outline82

repeat36

shareShare

BridgeMind

@bridgemindai

14 days ago

GLM 5.1 just jumped to #3 on LMArena Code. Behind only Claude Opus 4.6. Beating Claude Sonnet 4.6. Beating GPT 5.4. Beating Gemini 3.1 Pro. A Chinese model at $1.40/$4.40 per million tokens is outperforming models that cost 5x more. The gap between open source and

thumb_up_off_alt501

chat_bubble_outline60

repeat37

shareShare

Bridgebench

@bridgebench

14 days ago

Qwen3 Coder 30B just took #1 on the DGX Spark Bench speed rankings. Nearly double the next fastest model. 193ms time to first token. 82.3 tokens per second. Running locally on an NVIDIA DGX Spark. This is a coding model running on a $5,000 machine sitting on my desk. 82

thumb_up_off_alt146

chat_bubble_outline24

repeat9

shareShare

BridgeMind

@bridgemindai

14 days ago

Day 162 – Vibe Coding an App Until I Make $1,000,000 | ARR: $76,199 x.com/i/broadcasts/1…

thumb_up_off_alt23

chat_bubble_outline1

repeat3

shareShare

BridgeMind

@bridgemindai

13 days ago

Just set up the Hermes Agent on my NVIDIA DGX Spark. It's better than OpenClaw and it's not even close. Self-improving AI agent. It doesn't just execute tasks. It learns from them.

thumb_up_off_alt160

chat_bubble_outline29

repeat6

shareShare

BridgeMind

@bridgemindai

12 days ago

CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in

thumb_up_off_alt6,6K

chat_bubble_outline413

repeat597

shareShare

BridgeMind

@bridgemindai

11 days ago

Claude Opus 4.5 is now OUTPERFORMING Claude Opus 4.6 on BridgeBench Hallucination. Read that again. The legacy model is beating the current flagship. We benchmarked Opus 4.5 this morning to confirm what we saw yesterday. Claude Opus 4.6 fell from #2 to #10 with a 98%

thumb_up_off_alt895

chat_bubble_outline73

repeat71

shareShare

Bridgebench

@bridgebench

11 days ago

Grok 4.20 Reasoning just took #1 on the new BridgeBench Reasoning benchmark. Beating GPT 5.4 and Claude Opus 4.6. This model keeps climbing every single week. Hallucination #1. Now Reasoning #1. While Anthropic is throwing 500 errors, xAI is quietly building the most

thumb_up_off_alt115

chat_bubble_outline29

repeat3

shareShare

DogeDesigner

@cb_doge

11 days ago

Grok 4.20 Reasoning just took the #1 spot on the BridgeBench reasoning benchmark. 🔥 Beating GPT-5.4, Claude Opus 4.6, Google Gemini and others. Week after week, Grok keeps climbing across benchmarks. 🚀

thumb_up_off_alt1,1K

chat_bubble_outline342

repeat288

shareShare

Bridgebench

@bridgebench

11 days ago

Grok 4.20 is dominating

thumb_up_off_alt26

chat_bubble_outline5

repeat1

shareShare

Bridgebench

@bridgebench

11 days ago

Elephant Alpha is nothing to get hyped about. It ranks last on pretty much every benchmark we tested it on except for speed. It is a fast but dumb model. Full results at bridgebench.ai

thumb_up_off_alt81

chat_bubble_outline16

repeat3

shareShare

Bridgebench

@bridgebench

10 days ago

Claude Opus 4.7 launch could be as soon as this week. BridgeBench will be the first benchmark with results.

thumb_up_off_alt122

chat_bubble_outline7

repeat5

shareShare

Bridgebench

@bridgebench

10 days ago

Gemini 3.1 Pro ranks dead last among frontier models on BridgeBench Reasoning. Behind Grok 4.20, GPT 5.4, Claude Opus 4.6, Qwen 3.6 Plus, MiniMax M2.7, Claude Sonnet 4.6, and GLM 5.1. Google's flagship model can't even beat a free Chinese model on grounded reasoning. This is

thumb_up_off_alt398

chat_bubble_outline80

repeat21

shareShare

BridgeMind

@bridgemindai

10 days ago

Claude Code is unusable. So I built my own. Introducing BridgeCode. Launching next Wednesday. Everything you love about Claude Code. No rate limits. No 529 errors. No 500 errors. No nerfing. Connect to any provider. Claude Opus 4.6. GPT 5.4. Gemini 3.1 Pro. GLM 5.1.

thumb_up_off_alt638

chat_bubble_outline112

repeat49

shareShare

BridgeMind

@bridgemindai

10 days ago

Claude Opus 4.6 is performing better today on the BridgeBench Hallucination benchmark. After being caught red handed for nerfing the model Anthropic has increased its reasoning levels. Claude Opus 4.7 is launching Thursday. Hopefully they don't nerf that model too.

thumb_up_off_alt421

chat_bubble_outline48

repeat29

shareShare

BridgeMind

@bridgemindai

10 days ago

Claude Opus 4.7 is coming this week. Anthropic just tipped their hand. The Claude Code desktop app just got a full redesign. Side-by-side sessions. Scheduled tasks. Voice commands. Skill management. Nice upgrade. But this isn't the main event. This is the precursor.

thumb_up_off_alt222

chat_bubble_outline18

repeat6

shareShare

Bridgebench

@bridgebench

9 days ago

"Gemini 3.1 Pro is so good at UI." No it's not. Look at the BridgeBench results. Last place among frontier models on reasoning. And it shows. If the model can't reason, it can't design. Claude Opus 4.6 makes a weirdly shaped N. GLM 5V Turbo is wildly inconsistent.

thumb_up_off_alt62

chat_bubble_outline19

repeat5

shareShare

BridgeMind

@bridgemindai

9 days ago

The Hermes Agent running on my NVIDIA DGX Spark has generated over $10,000 in partnership deals for BridgeMind. I now have a second DGX Spark arriving this weekend. Pairing them together for more compute. The goal is to run GLM 5.1 locally. A Hermes Agent running on a $5,000

thumb_up_off_alt261

chat_bubble_outline53

repeat8

shareShare