Matan Halevy (@matanhalevy) 's Twitter Profile
Matan Halevy

@matanhalevy

building rn
same aura as geoffrey hinton

ID: 550362839

calendar_today10-04-2012 19:07:44

31 Tweet

48 Followers

89 Following

Matan Halevy (@matanhalevy) 's Twitter Profile Photo

I’m fascinated by how people eval world models like this. Is there a way so it’s without human feedback? Can we do some basic GAN like approach where you try to convince a Waymo ai they are in the real world by making it so realistic?

Matan Halevy (@matanhalevy) 's Twitter Profile Photo

Superintelligence is too important of a pursuit to be gate kept by select players. Prime intellect is doing some of the most important work to ensure our future technology remains democratized.

Andon Labs (@andonlabs) 's Twitter Profile Photo

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen.

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen.
Matan Halevy (@matanhalevy) 's Twitter Profile Photo

i have Z.ai 's GLM 5 playing Civilization against Opus 4.6 and GLM5 is exploring in a Z shape, did we just hit brand-aware AI 🤔🤔

i have <a href="/Zai_org/">Z.ai</a> 's GLM 5 playing Civilization against Opus 4.6 and GLM5 is exploring in a Z shape, did we just hit brand-aware AI 🤔🤔
Aakash Gupta (@aakashg0) 's Twitter Profile Photo

Sundar buried the real story in the cost data. Gemini 3 Deep Think went from 45.1% to 84.6% on ARC-AGI-2 in under 3 months. That’s an 88% improvement on a benchmark specifically designed to resist brute-force scaling. The number that matters: $13.62 per task. The previous Deep