Jorge Pessoa (@jorge__pessoa) Twitter Tweets • TwiCopy

Andrej Karpathy

2 years ago

Jagged Intelligence The word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems. E.g. example from two

thumb_up_off_alt3,3K

chat_bubble_outline216

repeat396

shareShare

Noxus

@noxus_ai

a year ago

🚀 New Year New Name: NOXUS Inspired by the ambition and vision of the Noxus organization from League of Legends, Noxus delivers enterprise-ready AI intelligence and automation without building from the ground up. The only AI platform your organization will ever need. #Noxus

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Jorge Pessoa

@jorge__pessoa

a year ago

Deepseek hitting back against OpenAI and besting them at their own game while being the actually Open company wasn't on my bingo card for 2025, I'll be honest.

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Jorge Pessoa

@jorge__pessoa

a year ago

Recommend everyone to take a look at R1 paper, feels like the AlphaGo Zero moment of open-source LLMs

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

ARC Prize

@arcprize

a year ago

Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks) DeepSeek V3: * Semi-Private: 7.3% ($.002) * Public Eval: 14% ($.002) DeepSeek Reasoner: * Semi-Private: 15.8% ($.06) * Public Eval: 20.5% ($.05) (Avg $ per task)

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat107

shareShare

João Pedro Almeida

@almeida95joao

a year ago

The most exciting thing about DeepSeek R1 isn’t that it’s open-source or matches OpenAI’s o1 in reasoning tasks—or even that it’s 90–95% cheaper. The real breakthrough? It proves that LLMs can improve reasoning through pure reinforcement learning, without massive CoT datasets.

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Jorge Pessoa

@jorge__pessoa

a year ago

We've been migrating some production workloads to R1. It's surprising how it's clearly beating o1 and o1-mini in real use cases for us. It's not a benchmark warrior

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

François Chollet

@fchollet

a year ago

It's often the case that an excess of resources disincentivizes innovation.

thumb_up_off_alt1,1K

chat_bubble_outline121

repeat157

shareShare

Tony Wu

@tonywu_71

a year ago

The new smaller SmolVLM models just dropped, so ofc we had to train a ColPali version for them! Introducing the ColSmol family: the 500M model can retrieve documents with higher accuracy compared to the original ColPali checkpoint with about 6x less weights 🚀 (1/4 🧵)

thumb_up_off_alt202

chat_bubble_outline4

repeat40

shareShare

Andrej Karpathy

@karpathy

a year ago

It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need

thumb_up_off_alt2,2K

chat_bubble_outline83

repeat249

shareShare

Deli Chen

@victor207755822

a year ago

Unbelievable results, feels like a dream—our R1 model is now #1 in the world (with style control)! 🌍🏆 Beyond words right now. 🤯 All I know is we keep pushing forward to make open-source AGI a reality for everyone. 🚀✨ #OpenSource #AI #AGI #DeepSeekR1

thumb_up_off_alt7,7K

chat_bubble_outline323

repeat573

shareShare

João Pedro Almeida

@almeida95joao

a year ago

𝐋𝐞𝐭’𝐬 𝐛𝐞 𝐫𝐞𝐚𝐥: 𝐌𝐨𝐬𝐭 𝐀𝐈 𝐭𝐨𝐨𝐥𝐬 𝐭𝐨𝐝𝐚𝐲 𝐚𝐫𝐞 𝐟𝐥𝐚𝐬𝐡𝐲 𝐝𝐞𝐦𝐨𝐬 𝐭𝐡𝐚𝐭 𝐟𝐚𝐥𝐥 𝐚𝐩𝐚𝐫𝐭 𝐢𝐧 𝐫𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐞𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧. There’s a persistent myth that fully unsupervised agents—or swarms of them—can

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare