Jorge Pessoa (@jorge__pessoa) 's Twitter Profile
Jorge Pessoa

@jorge__pessoa

Founder @ noxus.ai

ID: 1164467552078184448

link calendar_today22-08-2019 09:21:27

16 Tweet

12 Followers

37 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Jagged Intelligence The word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems. E.g. example from two

Jagged Intelligence

The word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems.

E.g. example from two
Noxus (@noxus_ai) 's Twitter Profile Photo

🚀 New Year New Name: NOXUS Inspired by the ambition and vision of the Noxus organization from League of Legends, Noxus delivers enterprise-ready AI intelligence and automation without building from the ground up. The only AI platform your organization will ever need. #Noxus

🚀 New Year New Name: NOXUS

Inspired by the ambition and vision of the Noxus organization from League of Legends, Noxus delivers enterprise-ready AI intelligence and automation without building from the ground up.

The only AI platform your organization will ever need. #Noxus
Jorge Pessoa (@jorge__pessoa) 's Twitter Profile Photo

Deepseek hitting back against OpenAI and besting them at their own game while being the actually Open company wasn't on my bingo card for 2025, I'll be honest.

ARC Prize (@arcprize) 's Twitter Profile Photo

Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks) DeepSeek V3: * Semi-Private: 7.3% ($.002) * Public Eval: 14% ($.002) DeepSeek Reasoner: * Semi-Private: 15.8% ($.06) * Public Eval: 20.5% ($.05) (Avg $ per task)

João Pedro Almeida (@almeida95joao) 's Twitter Profile Photo

The most exciting thing about DeepSeek R1 isn’t that it’s open-source or matches OpenAI’s o1 in reasoning tasks—or even that it’s 90–95% cheaper. The real breakthrough? It proves that LLMs can improve reasoning through pure reinforcement learning, without massive CoT datasets.

The most exciting thing about DeepSeek R1 isn’t that it’s open-source or matches OpenAI’s o1 in reasoning tasks—or even that it’s 90–95% cheaper.

The real breakthrough? It proves that LLMs can improve reasoning through pure reinforcement learning, without massive CoT datasets.
Jorge Pessoa (@jorge__pessoa) 's Twitter Profile Photo

We've been migrating some production workloads to R1. It's surprising how it's clearly beating o1 and o1-mini in real use cases for us. It's not a benchmark warrior

Tony Wu (@tonywu_71) 's Twitter Profile Photo

The new smaller SmolVLM models just dropped, so ofc we had to train a ColPali version for them! Introducing the ColSmol family: the 500M model can retrieve documents with higher accuracy compared to the original ColPali checkpoint with about 6x less weights 🚀 (1/4 🧵)

The new smaller SmolVLM models just dropped, so ofc we had to train a ColPali version for them!

Introducing the ColSmol family: the 500M model can retrieve documents with higher accuracy compared to the original ColPali checkpoint with about 6x less weights 🚀 (1/4 🧵)
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need

Deli Chen (@victor207755822) 's Twitter Profile Photo

Unbelievable results, feels like a dream—our R1 model is now #1 in the world (with style control)! 🌍🏆 Beyond words right now. 🤯 All I know is we keep pushing forward to make open-source AGI a reality for everyone. 🚀✨ #OpenSource #AI #AGI #DeepSeekR1

João Pedro Almeida (@almeida95joao) 's Twitter Profile Photo

𝐋𝐞𝐭’𝐬 𝐛𝐞 𝐫𝐞𝐚𝐥: 𝐌𝐨𝐬𝐭 𝐀𝐈 𝐭𝐨𝐨𝐥𝐬 𝐭𝐨𝐝𝐚𝐲 𝐚𝐫𝐞 𝐟𝐥𝐚𝐬𝐡𝐲 𝐝𝐞𝐦𝐨𝐬 𝐭𝐡𝐚𝐭 𝐟𝐚𝐥𝐥 𝐚𝐩𝐚𝐫𝐭 𝐢𝐧 𝐫𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐞𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧. There’s a persistent myth that fully unsupervised agents—or swarms of them—can