Bobby (@bobbycxy) Twitter Tweets • TwiCopy

Bobby

9 months ago

My first tweet. And in honor of textarena and León Leshem (Legend) Choshen 🤖🤗 Henry Mao : I just won against GPT 4o mini in SpellingBee-v0 on TextArena! Check it out at textarena.ai api.textarena.ai/shared_img/09a…

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Bobby

@bobbycxy

9 months ago

I just won against GPT 4o mini in Tak-v0 on TextArena! Check it out at textarena.ai api.textarena.ai/shared_img/656…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

León

@leonguertler

8 months ago

Andrej Karpathy Perfect timing, we are just about to publish TextArena. A collection of 57 text-based games (30 in the first release) including single-player, two-player and multi-player games. We tried keeping the interface similar to OpenAI gym, made it very easy to add new games, and created

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat121

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

8 months ago

Not released yet, but Andrej Karpathy leaked our gym like environment plus model competition...

thumb_up_off_alt13

chat_bubble_outline3

repeat4

shareShare

León

@leonguertler

8 months ago

Elon Musk Andrej Karpathy "Mom get the camera"

<a href="/elonmusk/">Elon Musk</a> <a href="/karpathy/">Andrej Karpathy</a> "Mom get the camera"

thumb_up_off_alt28

chat_bubble_outline0

repeat3

shareShare

León

@leonguertler

7 months ago

Competitive games with a fixed pace provide an excellent evaluation framework for balancing quality and speed in decision-making.

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

León

@leonguertler

6 months ago

TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more. Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.

thumb_up_off_alt210

chat_bubble_outline8

repeat43

shareShare

Bobby

@bobbycxy

5 months ago

Thank you AK and DailyPapers for sharing our work. Appreciate it!

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

León

@leonguertler

3 months ago

For the past ~2 months we have been working on training reasoning models on TextArena games. The first paper (introducing what we think is a very promising new paradigm) will hopefully be up later this week / early next; and the second one, focusing on the "scaling laws" of

thumb_up_off_alt308

chat_bubble_outline2

repeat51

shareShare

Kevin Wang

@kevinwang_111

3 months ago

Excited to announce the Mindgame @NeurIPS Competition is officially LIVE! 🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto. Sign up now for $500 in compute credits on your initial run! 🔗 Register : mindgamesarena.com

thumb_up_off_alt78

chat_bubble_outline5

repeat18

shareShare

Simon Yu

@simon_ycl

2 months ago

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by León) for benchmarking models on >60 games. though you can see GM Magnus Carlsen's comments on LLMs chess play 🔥

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by <a href="/LeonGuertler/">León</a>) for benchmarking models on >60 games.

though you can see GM <a href="/MagnusCarlsen/">Magnus Carlsen</a>'s comments on LLMs chess play 🔥

thumb_up_off_alt59

chat_bubble_outline0

repeat8

shareShare

will brown

@willccbb

2 months ago

something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams

thumb_up_off_alt421

chat_bubble_outline12

repeat18

shareShare