Joel Z (@thecodeofjoel) 's Twitter Profile
Joel Z

@thecodeofjoel

I code things sometimes. 🪐
PFP: @Elmi39Project | faunaraara.com | sanaspacebirthdayproject.com | node-chatgpt-api | Pioneer for Reddit (for AVP)

ID: 1439288761850810369

linkhttps://github.com/waylaidwanderer calendar_today18-09-2021 18:02:42

1,1K Tweet

1,1K Followers

38 Following

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

ClaudePlaysPokemon restarted with Claude 4 so for fun we're restarting too! You'll be able to watch Claude and Gemini play side-by-side, exploring each model and their harnesses' strengths and weaknesses! (Note: don't treat this as a serious race!) holodex.net/multiview/AAGY…

OpenAI Developers (@openaidevs) 's Twitter Profile Photo

Watch o3 play Pokémon—live. See how it plans its next move, explains its reasoning, analyzes the map visually, and saves to memory. Thank you community member Clad3815 for putting this stream together!

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

Test run of Pkmn Yellow Legacy is live! This new run comes with an updated harness: ➕notepad ➖predefined agents deleted ➕AI creates custom agents ➕code execution ... and more! Notepad and agents are tracked live: github.com/waylaidwandere… Watch now: twitch.tv/gemini_plays_p…

swyx (@swyx) 's Twitter Profile Photo

this thing in the Gemini 2.5 tech report actually redeems Gemini Plays Pokemon quite a bit - a lot of people were complaining that Gemini Plays Pokemon started with a Blastoise and received constant developer edits during the run (and therefore is not comparable with Claude

this thing in the Gemini 2.5 tech report actually redeems Gemini Plays Pokemon quite a bit - 

a lot of people were complaining that Gemini Plays Pokemon started with a Blastoise and received constant developer edits during the run (and therefore is not comparable with Claude
Joel Z (@thecodeofjoel) 's Twitter Profile Photo

I'm working on a post about my Gemini Plays Pokemon project. This will be the first developer blog post I've ever written. I hope to give you some interesting behind-the-scenes information. Follow to be notified: blog.jcz.dev

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

It's my birthday today! As a gift, please come by the stream and say hi to Gem 🥳 Still streaming Pokemon Yellow Legacy (hard mode), and Gemini is making good progress with its new harness that lets it code and create its own agents freely! twitch.tv/gemini_plays_p…

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

This shows how even a simple notepad can make a big difference for long-horizon tasks. In the current Pokémon Yellow Legacy run, Gemini uses one to keep track of goals, plans, and issues. twitch.tv/geminiplayspok…

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

As models improve, complex scaffolding becomes less necessary — a strong system prompt is often enough. Benchmarks like Gemini Plays Pokémon help show where today’s SOTA models fall along that spectrum.

Joel Z (@thecodeofjoel) 's Twitter Profile Photo

I wrote up the making-of for Gemini Plays Pokémon: how I designed the scaffold so Gemini 2.5 Pro could handle a long-horizon game, what failed, and the lessons that made it work. Full post: blog.jcz.dev/the-making-of-…

Google for Developers (@googledevs) 's Twitter Profile Photo

🎮✨ Gemini 2.5 Pro took on Pokémon Blue and became the champion. Twice. Now it’s tackling even tougher challenges in Pokémon Yellow Legacy, and is creating its own tools and subagents as it goes. Tune in to the Elite Four live now on hard mode with level caps, no items in

Kiran Vodrahalli (kiranvodrahalli@mathstodon.xyz) (@kiranvodrahalli) 's Twitter Profile Photo

Andrew Carr (e/🤸) hey heads up, this is inaccurate on multiple levels -- Gemini 2.5 Pro finished in 35k actions (see arxiv.org/pdf/2507.06261 Fig 15a), Claude has not finished at all, the definition of a "step" is different across all three (see the 2.5 report), and tools/harnesses are different

Kiran Vodrahalli (kiranvodrahalli@mathstodon.xyz) (@kiranvodrahalli) 's Twitter Profile Photo

I've been seeing a few posts lately comparing GPT-5 vs o3 on Pokemon that have some misconceptions - to exemplify the problem with these comparisons, one "step" for GPT-5 can include a lookup to game knowledge that instantly solves certain complex puzzles like Cinnabar Mansion

I've been seeing a few posts lately comparing GPT-5 vs o3 on Pokemon that have some misconceptions - to exemplify the problem with these comparisons, one "step" for GPT-5 can include a lookup to game knowledge that instantly solves certain complex puzzles like Cinnabar Mansion