Joan Gamell, bonvivant.eth (@gamell) Twitter Tweets • TwiCopy

Joan Gamell, bonvivant.eth

@gamell

5 months ago

White Christmas in La Jolla

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Joan Gamell, bonvivant.eth

@gamell

5 months ago

I've been playing this Code Arena-generated Tetris more than I care to admit 😂 Try it here …ffc-71f6-a578-48e5f195ac5f.arena.site

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

5 months ago

📊 Leaderboard Ranking Method Update We’ve refined how ranks are displayed to make them more interpretable and statistically accurate. Each model now shows: • Raw Rank: its position by Arena score (no ties) • Rank Spread: the best-to-worst range based on confidence

thumb_up_off_alt226

chat_bubble_outline12

repeat20

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

5 months ago

🚨BREAKING: Google DeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains

🚨BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini-3-Pro is now #1 across all major Arena leaderboards

🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5
🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards.

Massive gains

thumb_up_off_alt1,1K

chat_bubble_outline78

repeat227

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

5 months ago

🚨🍌BREAKING: Google DeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena! Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards. Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for

🚨🍌BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena!

Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards.

Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for

thumb_up_off_alt299

chat_bubble_outline20

repeat22

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

4 months ago

Today, we’re excited to announce our $150M Series A at a $1.7B valuation—nearly 3× our May seed round. Since launching evaluations in Sept, our annualized consumption run rate has surpassed $30M. Our mission is clear: to measure and advance the frontier of AI for real-world use,

thumb_up_off_alt643

chat_bubble_outline71

repeat61

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

Who’s actually leading the AI race? It depends on which leaderboard you look at. On Arena’s Text leaderboard (since May 2023): 🔹OpenAI leads 74% of the time 🔹Google DeepMind 21% 🔹Anthropic 5% But zoom into Expert prompts (~5% of the hardest real-world tasks) and the

thumb_up_off_alt168

chat_bubble_outline19

repeat14

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

🚨BIG NEWS: 🎬 Video Arena is now live on the web! Test out Veo 3.1, Sora 2, Seedance v1.5 Pro, Kling 2.6 Pro, Wan 2.5 & more. What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform

thumb_up_off_alt210

chat_bubble_outline22

repeat32

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

LMArena is now Arena. A name that takes us back to our roots with a powerful mission: to measure and advance the frontier of AI for real-world use. We have grown from a small PhD research project to a platform powered by a global community of millions. This rebrand has been

thumb_up_off_alt675

chat_bubble_outline41

repeat59

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

👋Say hello to Max! Max is Arena’s intelligent router, powered by 5+ million real-world community votes. Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model

thumb_up_off_alt230

chat_bubble_outline9

repeat20

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

🚨BREAKING: Claude Opus 4.6 by Anthropic is now #1 across Code, Text and Expert Arena! Opus 4.6 shows significant gains across the board: - #1 Code Arena: +106 score vs Opus 4.5 - #1 Text Arena: scoring 1496, +10 vs Gemini 3 Pro - #1 Expert Arena: +~50 lead Congrats to the

🚨BREAKING: Claude Opus 4.6 by <a href="/AnthropicAI/">Anthropic</a> is now #1 across Code, Text and Expert Arena!

Opus 4.6 shows significant gains across the board:
- #1 Code Arena: +106 score vs Opus 4.5
- #1 Text Arena: scoring 1496, +10 vs Gemini 3 Pro
- #1 Expert Arena: +~50 lead

Congrats to the

thumb_up_off_alt743

chat_bubble_outline28

repeat68

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

The new @xAI Grok-Imagine-Image model is a Pareto-optimal model in Image Arena: The Pareto frontier tells us which model has the highest Arena score at each price point. @xAi’s latest models have improved the frontier, giving optimal performance in the mid-price tier. For a wide

thumb_up_off_alt957

chat_bubble_outline132

repeat163

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months ago

✨NEW: Arena Leaderboard UI Updates Millions of votes power the leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best model for your task. Some highlights: • Filter by category (e.g. Coding,

thumb_up_off_alt97

chat_bubble_outline8

repeat13

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months ago

⚡️Who powers the Arena leaderboard? You do. But not all votes are Arena level research-grade quality. Every score is built from real-world prompts and human input, continuously refreshed as the way we use AI evolves. In this video, ML Scientist Clayton Thorrez explains how votes are

thumb_up_off_alt45

chat_bubble_outline4

repeat4

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months ago

Today, we’re launching a dedicated Multi-File React leaderboard. When Code Arena first launched, we evaluated models on single-file HTML. Then we raised the bar → multi-file React apps (routing, hooks, components, state management) and now have a leaderboard to match!

thumb_up_off_alt144

chat_bubble_outline11

repeat8

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

a month ago

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! - #1 Grok-Imagine-Video, @xAI - #2 Kling-o3-pro, Kling AI - #3 Kling-o1-pro, Kling AI - #4 Gen4-aleph, @Runwayml The leaderboard is powered by thousands of real-world community

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models!

- #1 Grok-Imagine-Video, @xAI
- #2 Kling-o3-pro, <a href="/Kling_ai/">Kling AI</a>
- #3 Kling-o1-pro, <a href="/Kling_ai/">Kling AI</a>
- #4 Gen4-aleph, @Runwayml

The leaderboard is powered by thousands of real-world community

thumb_up_off_alt194

chat_bubble_outline17

repeat17

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

a month ago

Did you know? We’re funding independent research in AI evaluation and measurement—up to $50k per project. The Q1 deadline to apply for Arena’s Academic Partnerships Program is March 31.

thumb_up_off_alt30

chat_bubble_outline1

repeat7

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

24 days ago

We’ve added Pareto frontier charts to the leaderboard. Now available across: Text, Vision, Search, Document, and Code Arena. The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens

thumb_up_off_alt210

chat_bubble_outline14

repeat25

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 days ago

Exciting news - GPT-Image-2 by OpenAI has claimed the #1 spot across all Image Arena leaderboards! A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date. - #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search

Exciting news - GPT-Image-2 by <a href="/OpenAI/">OpenAI</a> has claimed the #1 spot across all Image Arena leaderboards!

A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date.

- #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search

thumb_up_off_alt2,2K

chat_bubble_outline98

repeat323

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

a day ago

Exciting news - DeepSeek V4 Pro is in the Arena with 1.6T parameters (49B activated) alongside V4 Flash at 284B parameters (13B activated). Both support 1M token context. It’s a major leap over DeepSeek V3.2! Code Arena: - DeepSeek V4 Pro (thinking): #3 open model (#14 overall),

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat139

shareShare