Vlad Feinberg (@feinbergvlad) 's Twitter Profile
Vlad Feinberg

@feinbergvlad

Flash Pretraining TL, Gemini, Google DeepMind

ID: 1035960203659726848

linkhttp://vladfeinberg.com calendar_today01-09-2018 18:38:54

226 Tweet

947 Followers

63 Following

Dan Mac (@daniel_mac8) 's Twitter Profile Photo

everyone comparing deepseek-r1 to o1 and forgetting about Gemini 2 Flash Thinking which is better than r1 on every cost and performance metric

everyone comparing deepseek-r1 to o1

and forgetting about Gemini 2 Flash Thinking

which is better than r1 on every cost and performance metric
Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

The progress with our Gemini reasoning models is actually wild, we are in the GPT-2 era of scaling reasoning! The main delta is that the models are actually super useful in their current form and getting better week over week. The future is exciting...

The progress with our Gemini reasoning models is actually wild, we are in the GPT-2 era of scaling reasoning!

The main delta is that the models are actually super useful in their current form and getting better week over  week. The future is exciting...
Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
Advait Bopardikar (@advaitonline) 's Twitter Profile Photo

It's been a week and Gemini 2.0 Flash has already overtaken one of the Sonnet endpoints on OpenRouter for daily usage. A lot of it is coming from Coding use. Can't beat the price: performance ratio. ♊⚡📈

It's been a week and Gemini 2.0 Flash has already overtaken one of the Sonnet endpoints on <a href="/OpenRouterAI/">OpenRouter</a> for daily usage. A lot of it is coming from Coding use. Can't beat the price: performance  ratio. ♊⚡📈
Elad Hazan (@hazanprinceton) 's Twitter Profile Photo

Our team at GDM Princeton is hiring! if you want to work on theoretically founded next-gen architectures for LLM, please apply here: sites.google.com/view/gbrainpri…

Shawn (@shawnryan96) 's Twitter Profile Photo

Gemini flash 2.0 experimental is the first model I feel that really generalizes over different modalities. It also feels like real reasoning even when it gets it wrong. It seems to think outside the box in some cases.

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆

Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
koray kavukcuoglu (@koraykv) 's Twitter Profile Photo

1/ Today we are releasing Gemini 2.5 Pro Experimental, our newest Gemini model with integrated “thinking” and significant performance gains. Very proud of the whole team! 🧵

Oriol Vinyals (@oriolvinyalsml) 's Twitter Profile Photo

Introducing Gemini 2.5 Pro Experimental! 🎉 Our newest Gemini model has stellar performance across math and science benchmarks. It’s an incredible model for coding and complex reasoning, and it’s #1 on the lmarena.ai leaderboard by a drastic 40 ELO margin. Only a handful of

Vlad Feinberg (@feinbergvlad) 's Twitter Profile Photo

#2 only to 2.5 Pro :) Another amazing collab across the board! A special thank you to my awesome team Arnaud Autef Arun Ahuja Geng Yan who were instrumental in getting this pretrained! So many more people I need to list here who helped across the stack---too many to tweet!

Dillon Uzar (@dillonuzar) 's Twitter Profile Photo

Another update - Ran Gemini 2.5 Flash (Auto Thinking and Non-Thinking). See the comparison below to other thinking models. Interesting curve for Gemini 2.5 Flash Non-Thinking! Meanwhile Gemini 2.5 Flash Thinking (Auto) matches Gemini 2.5 Pro! I'm still working on o3 access and

Another update - Ran Gemini 2.5 Flash (Auto Thinking and Non-Thinking). See the comparison below to other thinking models.

Interesting curve for Gemini 2.5 Flash Non-Thinking! Meanwhile Gemini 2.5 Flash Thinking (Auto) matches Gemini 2.5 Pro!

I'm still working on o3 access and
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨Breaking: Google DeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude

🚨Breaking: <a href="/GoogleDeepMind/">Google DeepMind</a>’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆

Highlights:
- #1 in all text arenas (Coding, Style Control, Creative Writing, etc)
- #1 on the Vision leaderboard with a ~70 pts lead!
- #1 on WebDev Arena, surpassing Claude
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

📢We’re excited to share that we’ve raised $100M in seed funding to support LMArena and continue our research on reliable AI. Led by a16z and UC Investments (University of California), we're proud to have the support of those that believe in both the science and the mission. We’re

Jack Rae (@jack_w_rae) 's Twitter Profile Photo

There was a lot of announcements at IO, easy to overlook the new 2.5 Flash. It's pushing new boundaries in capability vs speed!

There was a lot of announcements at IO, easy to overlook the new 2.5 Flash.

It's pushing new boundaries in capability vs speed!
Melvin Johnson (@melvinjohnsonp) 's Twitter Profile Photo

Great to see 2.5 Flash improve in their utility for both the reasoning and non-reasoning slices. It's an incredible model for most use cases. We're excited to see what you all build with it.