AiBattle (@aibattle_) 's Twitter Profile
AiBattle

@aibattle_

Artificial Intelligence

ID: 1367154052409212928

calendar_today03-03-2021 16:45:27

221 Tweet

544 Followers

293 Following

AiBattle (@aibattle_) 's Twitter Profile Photo

Gemini 3 "9d30" 🆚Lithiumflow 🆚Orionmist 🆚Gemini 2.5 Pro - 3D Voxel first gen starter Pokemon scene Lithiumflow and Orionmist seem slightly less consistent and overall weaker than some of the Gemini 3 checkpoints I’ve tested on AI Studio Still, they demonstrate a significant

AiBattle (@aibattle_) 's Twitter Profile Photo

The new GPT-5 model "Willow" seems to have a low reasoning budget of "16" For comparison, the reasoning budgets of the Codex models are: GPT-5-Codex-Low : 64 GPT-5-Codex-Medium : 192 GPT-5-Codex-High : 256

The new GPT-5 model "Willow" seems to have a low reasoning budget of "16"

For comparison, the reasoning budgets of the Codex models are:

GPT-5-Codex-Low         :   64
GPT-5-Codex-Medium : 192
GPT-5-Codex-High        : 256
AiBattle (@aibattle_) 's Twitter Profile Photo

New GPT-5 checkpoints have been added to DesignArena The reasoning budget of the new models: Firefly: 0 Chrysalis: 16 Cicada: 64 Caterpillar: 256 The new checkpoints seem a bit better than the old ones from initial testing

New GPT-5 checkpoints have been added to DesignArena

The reasoning budget of the new models:

Firefly:              0
Chrysalis:      16
Cicada:          64
Caterpillar: 256

The new checkpoints seem a bit better than the old ones from initial testing
AiBattle (@aibattle_) 's Twitter Profile Photo

Grok Image "Mandarin"🆚Nano-Banana - Last paragraph from "Faust" The new Grok Image model "Mandarin" seems really good, text generation capabilities seem to be on par with Nano-Banana Impressive how quickly xAI is improving and iterating on their image and video generation

Grok Image "Mandarin"🆚Nano-Banana - Last paragraph from "Faust"

The new Grok Image model "Mandarin" seems really good, text generation capabilities seem to be on par with Nano-Banana

Impressive how quickly xAI is improving and iterating on their image and video generation
AiBattle (@aibattle_) 's Twitter Profile Photo

Gemini 3 appears to be rolling out now The Canvas feature in the mobile app seems to use the new Gemini 3 model The difference between Web and mobile for the 3D Pokemon voxel scene is huge

AiBattle (@aibattle_) 's Twitter Profile Photo

xAI has been testing different versions of Grok 4.1 on LmArena and Openrouter Don’t expect a huge jump in performance, significant improvements will likely come with Grok 5

xAI has been testing different versions of Grok 4.1 on LmArena and Openrouter

Don’t expect a huge jump in performance, significant improvements will likely come with Grok 5
AiBattle (@aibattle_) 's Twitter Profile Photo

Gemini 3 Pro Model card has been leaked Arc-Agi-2 : 31.1%, Humanity's last exam: 37.5% and Terminal-Bench 2.0: 54.2% are very impressive

Gemini 3 Pro Model card has been leaked

Arc-Agi-2 : 31.1%, Humanity's last exam: 37.5% and Terminal-Bench 2.0: 54.2% are very impressive