XLANG NLP Lab (@xlangnlp) Twitter Tweets • TwiCopy

XLANG NLP Lab

@xlangnlp

+ Follow

developing embodied AI agents that empower users to use language to interact with digital and physical environments to carry out real-world tasks.

ID: 1678044379121057792

linkhttps://xlang.ai calendar_today09-07-2023 14:12:50

103 Tweet

894 Takipçi

27 Takip Edilen

Bowen Wang

@bowenwangnlp

8 months ago

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free

thumb_up_off_alt333

chat_bubble_outline14

repeat104

shareShare

Tianbao Xie

@tianbaox

8 months ago

Finally we are here! 👏 Check out our most open & fair benchmark⚔️ for computer use capability evaluation for the community.

thumb_up_off_alt30

chat_bubble_outline8

repeat6

shareShare

Tao Yu

@taoyds

8 months ago

🚀After a year of development based on our OSWorld, Computer Use Agent Arena is LIVE! Test top AI agents (Operator, Claude 3.7...) on any kinds of computer use tasks with zero setup. Cloud-hosted, safe, and FREE! Try it now: arena.xlang.ai ! Data & code coming soon!

thumb_up_off_alt101

chat_bubble_outline5

repeat19

shareShare

XLANG NLP Lab

@xlangnlp

8 months ago

👉Compare and test Computer Use Agents (Operator, Claude 3.7...) on any kinds of tasks in real computers 🚩without any setup and cost🚩! Try our Computer Use Agent Arena: arena.xlang.ai

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

8 months ago

Check out Computer Use Agent Arena, an exciting new launch by OSWorld team XLANG NLP Lab!

thumb_up_off_alt136

chat_bubble_outline2

repeat7

shareShare

XLANG NLP Lab

@xlangnlp

8 months ago

🚀 Exciting news! OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena! Test, vote, and explore their full potential with CUAs at arena.xlang.ai! Join the community and dive in!

🚀 Exciting news! <a href="/OpenAI/">OpenAI</a>'s o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!
Test, vote, and explore their full potential with CUAs at arena.xlang.ai! Join the community and dive in!

thumb_up_off_alt14

chat_bubble_outline2

repeat4

shareShare

XLANG NLP Lab

@xlangnlp

8 months ago

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to arena.xlang.ai.

thumb_up_off_alt17

chat_bubble_outline0

repeat7

shareShare

XLANG NLP Lab

@xlangnlp

7 months ago

🏆 Leaderboard Update! 🚀 Claude 3.7 Sonnet from Anthropic ties #1 in Computer Agent Arena, followed by Operator from OpenAI & UI-TARS-1.5 from ByteDance, which is significantly different from prior benchmarks! Check the full rankings! 👉 arena.xlang.ai/leaderboard

🏆 Leaderboard Update!
🚀 Claude 3.7 Sonnet from <a href="/AnthropicAI/">Anthropic</a> ties #1 in Computer Agent Arena, followed by Operator from <a href="/OpenAI/">OpenAI</a> & UI-TARS-1.5 from <a href="/BytedanceTalk/">ByteDance</a>, which is significantly different from prior benchmarks!

Check the full rankings! 👉 arena.xlang.ai/leaderboard

thumb_up_off_alt89

chat_bubble_outline2

repeat23

shareShare

Bowen Wang

@bowenwangnlp

7 months ago

😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study: 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.” 2,

thumb_up_off_alt14

chat_bubble_outline1

repeat5

shareShare