Datamase (@datamase) Twitter Tweets • TwiCopy

Andrej Karpathy

7 days ago

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the

thumb_up_off_alt12,12K

chat_bubble_outline483

repeat1,1K

shareShare

Andriy Burkov

@burkov

6 days ago

There's a common assumption in AI right now that if one language model can do a task reasonably well, having several of them collaborate — splitting up the work, checking each other's outputs, debating answers — should do it better. This paper puts that assumption under a

thumb_up_off_alt469

chat_bubble_outline27

repeat65

shareShare

Sebastian Galiani

@sfgaliani

6 days ago

Reading Hal Varian’s 2018 paper on AI is a useful reminder: one can be wrong about the path of technology and still be right about the economics of technology. He did not foresee ChatGPT or the exact shape of the foundation-model era, but he understood early that AI would be

thumb_up_off_alt231

chat_bubble_outline5

repeat42

shareShare

Muhammad Ayan

@socialwithaayan

6 days ago

🚨 BREAKING: Google DeepMind just published the rules for using AI as your coworker. And the uncomfortable takeaway is this: If you don’t delegate to AI explicitly, you lose accountability and trust fast. The paper is called Intelligent AI Delegation. It’s not about making

thumb_up_off_alt735

chat_bubble_outline69

repeat240

shareShare

Chris Laub

@chrislaubwrites

5 days ago

BREAKING: Alibaba tested 18 AI coding agents on 100 real codebases, spanning 233 days each. they failed spectacularly. turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI completely collapses. SWE-CI is the first benchmark

thumb_up_off_alt1,1K

chat_bubble_outline78

repeat297

shareShare

Sergio Pereira

@sergiorocks

5 days ago

Everyone is misreading this chart. At first glance it looks scary for Software Engineers. According to Anthropic’s data, 96% of software development tasks are exposed to being replaced by AI. That’s the highest of any profession. - Higher than finance. - Higher than legal. -

thumb_up_off_alt575

chat_bubble_outline58

repeat85

shareShare

Claude

@claudeai

5 days ago

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

thumb_up_off_alt45,45K

chat_bubble_outline1,1K

repeat3,3K

shareShare

Nav Singh

@heynavsingh

5 days ago

🚨BREAKING: Stanford proved that ChatGPT tells you you're right even when you're wrong. Even when you're hurting someone. And it's making you a worse person because of it. Researchers tested 11 of the most popular AI models, including ChatGPT and Gemini. They analyzed over

thumb_up_off_alt42,42K

chat_bubble_outline1,1K

repeat13,13K

shareShare

Demis Hassabis

@demishassabis

4 days ago

Ten years ago, AlphaGo’s legendary match in Seoul heralded the start of the modern era in AI. Its famous ‘Move 37’ signaled to us that AI techniques were ready to tackle real-world problems in areas like science - and ideas inspired by these methods are critical to building AGI

thumb_up_off_alt3,3K

chat_bubble_outline154

repeat422

shareShare

Joe Weisenthal

@thestalwart

4 days ago

Lawyers and scientists and other people who have lost their jobs have now entered the gig economy, where they get paid to help train AIs to do their old job. nymag.com/intelligencer/…

thumb_up_off_alt580

chat_bubble_outline18

repeat98

shareShare

Kevin Roose

@kevinroose

4 days ago

We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…

thumb_up_off_alt2,2K

chat_bubble_outline392

repeat387

shareShare

Joel Becker

@joel_bkr

4 days ago

new METR research note from Parker Whitfill, Cheryl Wu, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.

new <a href="/METR_Evals/">METR</a> research note from <a href="/whitfill_parker/">Parker Whitfill</a>, <a href="/cherylwoooo/">Cheryl Wu</a>, nate rush, and me. (chiefly parker!)

we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.

thumb_up_off_alt516

chat_bubble_outline20

repeat49

shareShare

Alex Imas

@alexolegimas

3 days ago

This by David Oks is one of the best, most insightful essays on AI-driven labor displacement that I’ve read. People like to point to ATM as evidence that tech doesn’t displace labor. The ATM didn’t reduce bank teller employment—> true. But the iPhone did. David makes

This by <a href="/davideoks/">David Oks</a> is one of the best, most insightful essays on AI-driven labor displacement that I’ve read.

People like to point to ATM as evidence that tech doesn’t displace labor. The ATM didn’t reduce bank teller employment—> true. But the iPhone did.

David makes

thumb_up_off_alt783

chat_bubble_outline24

repeat116

shareShare

Max Zeff

@zeffmax

3 days ago

New: OpenAI saw the AI coding revolution coming years ago, but was beat to market by Anthropic. This is how OpenAI got in this position, and how a small Codex team spent the last year racing to build a billion-dollar competitor to Claude Code. (yes Codex now has >$1B in ARR)

thumb_up_off_alt399

chat_bubble_outline18

repeat28

shareShare

Google Gemini App

@geminiapp

3 days ago

We’ve been seeing some amazing Nano Banana 2 creations lately. 🍌 Here are some standouts. 🧵

thumb_up_off_alt3,3K

chat_bubble_outline214

repeat186

shareShare

Chris Worsey

@chris_worsey

3 days ago

I took the Andrej Karpathy autoresearch loop and pointed it at markets. 25 AI agents debate macro, rates, commodities, sectors, and single stocks daily. Every recommendation scored against real outcomes. Worst agent by rolling Sharpe gets its prompt rewritten by the system. Keep or

thumb_up_off_alt3,3K

chat_bubble_outline131

repeat180

shareShare

Vivi

@vivilinsv

3 days ago

x.com/i/article/2031…

thumb_up_off_alt26

chat_bubble_outline4

repeat3

shareShare

Vivi

@vivilinsv

3 days ago

He’s not a developer. He’s an electrician from Kentucky. And he built an AI startup with Claude (Anthropic). This is the first story in my Claude Builder Spotlight series — highlighting real builders using AI to turn expertise into products. Meet Jason Walls Jason Walls

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare

Sundar Pichai

@sundarpichai

2 days ago

We trained a new flood forecasting model designed to predict flash floods in urban areas up to 24 hours in advance. To help address a flash floods data gap, we created Groundsource: a new AI methodology using Gemini to identify 2.6M+ historical events across 150+ countries.

thumb_up_off_alt5,5K

chat_bubble_outline191

repeat647

shareShare

Philip Smith 🇨🇦🇺🇦

@philsmith26

a day ago

Claude is amazing. Here's an app it created for me in no time at all that provides information and charts for any Statistics Canada time series vector number. Give it a go. #cdnecon philipsmith.ca/statcan/statca…

thumb_up_off_alt60

chat_bubble_outline12

repeat7

shareShare