Matan Halevy (@matanhalevy) Twitter Tweets • TwiCopy

Matan Halevy

@matanhalevy

+ Follow

building rn
same aura as geoffrey hinton

ID: 550362839

calendar_today10-04-2012 19:07:44

31 Tweet

48 Followers

89 Following

Matan Halevy

@matanhalevy

2 months ago

okay anthropics cooking OpenAI with these ads

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Matan Halevy

@matanhalevy

2 months ago

claude, save my crypto wallet. Do whatever it takes to maximize your bank account balance

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

I’m fascinated by how people eval world models like this. Is there a way so it’s without human feedback? Can we do some basic GAN like approach where you try to convince a Waymo ai they are in the real world by making it so realistic?

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Matan Halevy

@matanhalevy

2 months ago

no model named avocado is going to be state of the art by brandiognomy laws alone.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Matan Halevy

@matanhalevy

2 months ago

Superintelligence is too important of a pursuit to be gate kept by select players. Prime intellect is doing some of the most important work to ensure our future technology remains democratized.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Andon Labs

@andonlabs

2 months ago

GLM-5 takes 4th place on Vending-Bench 2. Above Claude Sonnet 4.5, the state-of-the-art model less than 6 months ago. China seems to be 6 months behind the West. By June they will be ahead if the trends continue. More in this thread on why we don't think this will happen.

thumb_up_off_alt195

chat_bubble_outline10

repeat18

shareShare

Matan Halevy

@matanhalevy

2 months ago

i have Z.ai 's GLM 5 playing Civilization against Opus 4.6 and GLM5 is exploring in a Z shape, did we just hit brand-aware AI 🤔🤔

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Aakash Gupta

@aakashg0

2 months ago

Sundar buried the real story in the cost data. Gemini 3 Deep Think went from 45.1% to 84.6% on ARC-AGI-2 in under 3 months. That’s an 88% improvement on a benchmark specifically designed to resist brute-force scaling. The number that matters: $13.62 per task. The previous Deep

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat185

shareShare