Computer Intelligence (@bigcomproject) 's Twitter Profile
Computer Intelligence

@bigcomproject

SWE Arena: swe-arena.com

More data is coming!

ID: 1877355162031063040

linkhttps://bigcomputer-project.github.io/ calendar_today09-01-2025 14:02:17

27 Tweet

128 Followers

0 Following

Computer Intelligence (@bigcomproject) 's Twitter Profile Photo

Introducing 🏟️SWE Arena: An Open Evaluation Platform for Vibe Coding Unlike the current frontend-dev applications like Anthropic Claude Artifacts and v0, SWE Arena aims to execute ANY programs🚀, allowing users to compare the coding capabilities more accurately and

AK (@_akhaliq) 's Twitter Profile Photo

SWE Arena looks amazing for vibe coding Arena supports real-time code execution and rendering, covering various frontier LLMs & VLMs

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

gpt-4o-mini-2024-07-18 vs gpt-4o-2024-08-06 This is what AI can design at the moment :( Test it by yourself in SWE Arena. Prompt and LInk are below👇

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Case: claude-3-5-haiku-20241022 vs o1-2024-12-17 o1 wins! A bug was found in claude-3-5-haiku-20241022. The ball just fell from the spinning hexagon 😂 Try it out yourself at SWE Arena, with all kinds of frontier models and 100%.

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Case of Chart Derendering (image to code) on SWE Arena: claude-3-5-sonnet-20241022 vs gpt-4o-mini-2024-07-18 Claude 3.5 Sonnet is much better, right?! Wdyt?

Case of Chart Derendering (image to code) on SWE Arena:
claude-3-5-sonnet-20241022 vs gpt-4o-mini-2024-07-18

Claude 3.5 Sonnet is much better, right?!  Wdyt?
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

What if I tell you that you can generate the form in SWE Arena and print it out?! Case of Screenshot2Code: gpt-4o-mini-2024-07-18 vs claude-3-5-sonnet-20241022 Claude 3.5 Sonnet wins! Claude 3.5 Sonnet is much better at recognizing text in the image. SWE Arena is 100% free to

What if I tell you that you can generate the form in SWE Arena and print it out?!

Case of Screenshot2Code:
gpt-4o-mini-2024-07-18  vs claude-3-5-sonnet-20241022

Claude 3.5 Sonnet wins!
Claude 3.5 Sonnet is much better at recognizing text in the image.

SWE Arena is 100% free to
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Love this example! Congrats to Google DeepMind! I successfully replicated the Boggle game on SWE Arena. TLDR: Gemini 2.0 Pro won! In the case of Claude 3.5 Sonnet vs Gemini 2.0 Pro, both are given the image and prompt to generate the algo to get the results. The image and

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

gemini-2.0-pro-exp-02-05 vs o1-2024-12-17 The code generated by Gemini 2.0 Pro looks better than o1??! Do your own Vide Coding on SWE Arena! 100% free! 👇

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

You can replicate this in SWE Arena! Case of the animation generation: gemini-2.0-pro-exp-02-05 VS claude-3-5-haiku-20241022 Link: swe-arena.com Prompt is below 👇

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Today, we announce a collaboration between SWE Arena (Computer Intelligence) and Hugging Face (w/ Gradio). We believe that Hugging Face can help us shape the future of AI Software Engineering evaluations. We have now open-sourced the SWE Arena codebase to accelerate the development of

Today, we announce a collaboration between SWE Arena (<a href="/BigComProject/">Computer Intelligence</a>) and <a href="/huggingface/">Hugging Face</a> (w/ <a href="/Gradio/">Gradio</a>). We believe that <a href="/huggingface/">Hugging Face</a> can help us shape the future of AI Software Engineering evaluations.

We have now open-sourced the SWE Arena codebase to accelerate the development of
Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

Doing this on SWE Arena with side-by-side models: Hugging Face logo as an example gpt-4 vs qwen2.5-72b-instruct While gpt-4 used gradio to program a buggy app, Qwen qwen2.5-72b-instruct used React and did the same app like o3-mini w/o bugs.

Terry Yue Zhuo (@terryyuezhuo) 's Twitter Profile Photo

gpt3.5-turbo vs o3-mini prompt: make an app called chatgpt ad maker that takes in a video and does a halftone effect with sliders to adjust dot size input: Bad Apple!! by Touhou (with sound)