Joan Gamell, bonvivant.eth (@gamell) 's Twitter Profile
Joan Gamell, bonvivant.eth

@gamell

Software Engineer @moov | Photography | Bon Vivant

ID: 774670

linkhttps://gamell.io calendar_today16-02-2007 00:38:09

18 Tweet

748 Followers

1,1K Following

Joan Gamell, bonvivant.eth (@gamell) 's Twitter Profile Photo

I've been playing this Code Arena-generated Tetris more than I care to admit 😂 Try it here …ffc-71f6-a578-48e5f195ac5f.arena.site

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

📊 Leaderboard Ranking Method Update We’ve refined how ranks are displayed to make them more interpretable and statistically accurate. Each model now shows: • Raw Rank: its position by Arena score (no ties) • Rank Spread: the best-to-worst range based on confidence

📊 Leaderboard Ranking Method Update

We’ve refined how ranks are displayed to make them more interpretable and statistically accurate.

Each model now shows:
 • Raw Rank: its position by Arena score (no ties)
 • Rank Spread: the best-to-worst range based on confidence
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨BREAKING: Google DeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains

🚨BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini-3-Pro is now #1 across all major Arena leaderboards

🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5
🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards.

Massive gains
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨🍌BREAKING: Google DeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena! Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards. Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for

🚨🍌BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena!

Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards.

Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Today, we’re excited to announce our $150M Series A at a $1.7B valuation—nearly 3× our May seed round. Since launching evaluations in Sept, our annualized consumption run rate has surpassed $30M. Our mission is clear: to measure and advance the frontier of AI for real-world use,

Today, we’re excited to announce our $150M Series A at a $1.7B valuation—nearly 3× our May seed round. Since launching evaluations in Sept, our annualized consumption run rate has surpassed $30M.

Our mission is clear: to measure and advance the frontier of AI for real-world use,
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Who’s actually leading the AI race? It depends on which leaderboard you look at. On Arena’s Text leaderboard (since May 2023): 🔹OpenAI leads 74% of the time 🔹Google DeepMind 21% 🔹Anthropic 5% But zoom into Expert prompts (~5% of the hardest real-world tasks) and the

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨BIG NEWS: 🎬 Video Arena is now live on the web! Test out Veo 3.1, Sora 2, Seedance v1.5 Pro, Kling 2.6 Pro, Wan 2.5 & more. What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

LMArena is now Arena. A name that takes us back to our roots with a powerful mission: to measure and advance the frontier of AI for real-world use. We have grown from a small PhD research project to a platform powered by a global community of millions. This rebrand has been

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

👋Say hello to Max! Max is Arena’s intelligent router, powered by 5+ million real-world community votes. Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨BREAKING: Claude Opus 4.6 by Anthropic is now #1 across Code, Text and Expert Arena! Opus 4.6 shows significant gains across the board: - #1 Code Arena: +106 score vs Opus 4.5 - #1 Text Arena: scoring 1496, +10 vs Gemini 3 Pro - #1 Expert Arena: +~50 lead Congrats to the

🚨BREAKING: Claude Opus 4.6 by <a href="/AnthropicAI/">Anthropic</a> is now #1 across Code, Text and Expert Arena!

Opus 4.6 shows significant gains across the board:
- #1 Code Arena: +106 score vs Opus 4.5
- #1 Text Arena: scoring 1496, +10 vs Gemini 3 Pro
- #1 Expert Arena: +~50 lead

Congrats to the
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

The new @xAI Grok-Imagine-Image model is a Pareto-optimal model in Image Arena: The Pareto frontier tells us which model has the highest Arena score at each price point. @xAi’s latest models have improved the frontier, giving optimal performance in the mid-price tier. For a wide

The new @xAI Grok-Imagine-Image model is a Pareto-optimal model in Image Arena:

The Pareto frontier tells us which model has the highest Arena score at each price point. @xAi’s latest models have improved the frontier, giving optimal performance in the mid-price tier. For a wide
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

✨NEW: Arena Leaderboard UI Updates Millions of votes power the leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best model for your task. Some highlights: • Filter by category (e.g. Coding,

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

⚡️Who powers the Arena leaderboard? You do. But not all votes are Arena level research-grade quality. Every score is built from real-world prompts and human input, continuously refreshed as the way we use AI evolves. In this video, ML Scientist Clayton Thorrez explains how votes are

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Today, we’re launching a dedicated Multi-File React leaderboard. When Code Arena first launched, we evaluated models on single-file HTML. Then we raised the bar → multi-file React apps (routing, hooks, components, state management) and now have a leaderboard to match!

Today, we’re launching a dedicated Multi-File React leaderboard. When Code Arena first launched, we evaluated models on single-file HTML.

Then we raised the bar → multi-file React apps (routing, hooks, components, state management) and now have a leaderboard to match!
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! - #1 Grok-Imagine-Video, @xAI - #2 Kling-o3-pro, Kling AI - #3 Kling-o1-pro, Kling AI - #4 Gen4-aleph, @Runwayml The leaderboard is powered by thousands of real-world community

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models!

- #1 Grok-Imagine-Video, @xAI
- #2 Kling-o3-pro, <a href="/Kling_ai/">Kling AI</a> 
- #3 Kling-o1-pro, <a href="/Kling_ai/">Kling AI</a>
- #4 Gen4-aleph, @Runwayml

The leaderboard is powered by thousands of real-world community
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Did you know? We’re funding independent research in AI evaluation and measurement—up to $50k per project. The Q1 deadline to apply for Arena’s Academic Partnerships Program is March 31.

Did you know? We’re funding independent research in AI evaluation and measurement—up to $50k per project.

The Q1 deadline to apply for Arena’s Academic Partnerships Program is March 31.
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We’ve added Pareto frontier charts to the leaderboard. Now available across: Text, Vision, Search, Document, and Code Arena. The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens

We’ve added Pareto frontier charts to the leaderboard.

Now available across:
Text, Vision, Search, Document, and Code Arena.

The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting news - GPT-Image-2 by OpenAI has claimed the #1 spot across all Image Arena leaderboards! A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date. - #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search

Exciting news - GPT-Image-2 by <a href="/OpenAI/">OpenAI</a> has claimed the #1 spot across all Image Arena leaderboards!

A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date.

- #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting news - DeepSeek V4 Pro is in the Arena with 1.6T parameters (49B activated) alongside V4 Flash at 284B parameters (13B activated). Both support 1M token context. It’s a major leap over DeepSeek V3.2! Code Arena: - DeepSeek V4 Pro (thinking): #3 open model (#14 overall),

Exciting news - DeepSeek V4 Pro is in the Arena with 1.6T parameters (49B activated) alongside V4 Flash at 284B parameters (13B activated). Both support 1M token context. It’s a major leap over DeepSeek V3.2!

Code Arena:
- DeepSeek V4 Pro (thinking): #3 open model (#14 overall),