Datamase (@datamase) 's Twitter Profile
Datamase

@datamase

ID: 20838090

calendar_today14-02-2009 09:09:19

2,2K Tweet

140 Followers

968 Following

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:

- the human iterates on the
Andriy Burkov (@burkov) 's Twitter Profile Photo

There's a common assumption in AI right now that if one language model can do a task reasonably well, having several of them collaborate — splitting up the work, checking each other's outputs, debating answers — should do it better. This paper puts that assumption under a

There's a common assumption in AI right now that if one language model can do a task reasonably well, having several of them collaborate — splitting up the work, checking each other's outputs, debating answers — should do it better.

This paper puts that assumption under a
Sebastian Galiani (@sfgaliani) 's Twitter Profile Photo

Reading Hal Varian’s 2018 paper on AI is a useful reminder: one can be wrong about the path of technology and still be right about the economics of technology. He did not foresee ChatGPT or the exact shape of the foundation-model era, but he understood early that AI would be

Muhammad Ayan (@socialwithaayan) 's Twitter Profile Photo

🚨 BREAKING: Google DeepMind just published the rules for using AI as your coworker. And the uncomfortable takeaway is this: If you don’t delegate to AI explicitly, you lose accountability and trust fast. The paper is called Intelligent AI Delegation. It’s not about making

🚨 BREAKING: Google DeepMind just published the rules for using AI as your coworker.

And the uncomfortable takeaway is this:

If you don’t delegate to AI explicitly, you lose accountability and trust fast.

The paper is called Intelligent AI Delegation.

It’s not about making
Chris Laub (@chrislaubwrites) 's Twitter Profile Photo

BREAKING: Alibaba tested 18 AI coding agents on 100 real codebases, spanning 233 days each. they failed spectacularly. turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI completely collapses. SWE-CI is the first benchmark

BREAKING: Alibaba tested 18 AI coding agents on 100 real codebases, spanning 233 days each. they failed spectacularly.

turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI completely collapses.

SWE-CI is the first benchmark
Sergio Pereira (@sergiorocks) 's Twitter Profile Photo

Everyone is misreading this chart. At first glance it looks scary for Software Engineers. According to Anthropic’s data, 96% of software development tasks are exposed to being replaced by AI. That’s the highest of any profession. - Higher than finance. - Higher than legal. -

Everyone is misreading this chart.

At first glance it looks scary for Software Engineers.

According to Anthropic’s data, 96% of software development tasks are exposed to being replaced by AI. That’s the highest of any profession.

- Higher than finance.
- Higher than legal.
-
Claude (@claudeai) 's Twitter Profile Photo

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

Nav Singh (@heynavsingh) 's Twitter Profile Photo

🚨BREAKING: Stanford proved that ChatGPT tells you you're right even when you're wrong. Even when you're hurting someone. And it's making you a worse person because of it. Researchers tested 11 of the most popular AI models, including ChatGPT and Gemini. They analyzed over

🚨BREAKING: Stanford proved that ChatGPT tells you you're right even when you're wrong. Even when you're hurting someone.

And it's making you a worse person because of it.

Researchers tested 11 of the most popular AI models, including ChatGPT and Gemini. They analyzed over
Demis Hassabis (@demishassabis) 's Twitter Profile Photo

Ten years ago, AlphaGo’s legendary match in Seoul heralded the start of the modern era in AI. Its famous ‘Move 37’ signaled to us that AI techniques were ready to tackle real-world problems in areas like science - and ideas inspired by these methods are critical to building AGI

Joe Weisenthal (@thestalwart) 's Twitter Profile Photo

Lawyers and scientists and other people who have lost their jobs have now entered the gig economy, where they get paid to help train AIs to do their old job. nymag.com/intelligencer/…

Lawyers and scientists and other people who have lost their jobs have now entered the gig economy, where they get paid to help train AIs to do their old job. nymag.com/intelligencer/…
Kevin Roose (@kevinroose) 's Twitter Profile Photo

We made a blind taste test to see whether NYT readers prefer human writing or AI writing. 86,000 people have taken it so far, and the results are fascinating. Overall, 54% of quiz-takers prefer AI. A real moment! nytimes.com/interactive/20…

Joel Becker (@joel_bkr) 's Twitter Profile Photo

new METR research note from Parker Whitfill, Cheryl Wu, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.

new <a href="/METR_Evals/">METR</a> research note from <a href="/whitfill_parker/">Parker Whitfill</a>, <a href="/cherylwoooo/">Cheryl Wu</a>, nate rush, and me. (chiefly parker!)

we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.
Alex Imas (@alexolegimas) 's Twitter Profile Photo

This by David Oks is one of the best, most insightful essays on AI-driven labor displacement that I’ve read. People like to point to ATM as evidence that tech doesn’t displace labor. The ATM didn’t reduce bank teller employment—> true. But the iPhone did. David makes

This by <a href="/davideoks/">David Oks</a> is one of the best, most insightful essays on AI-driven labor displacement that I’ve read. 

People like to point to ATM as evidence that tech doesn’t displace labor. The ATM didn’t reduce bank teller employment—&gt; true. But the iPhone did. 

David makes
Max Zeff (@zeffmax) 's Twitter Profile Photo

New: OpenAI saw the AI coding revolution coming years ago, but was beat to market by Anthropic. This is how OpenAI got in this position, and how a small Codex team spent the last year racing to build a billion-dollar competitor to Claude Code. (yes Codex now has >$1B in ARR)

New: OpenAI saw the AI coding revolution coming years ago, but was beat to market by Anthropic. 

This is how OpenAI got in this position, and how a small Codex team spent the last year racing to build a billion-dollar competitor to Claude Code. 

(yes Codex now has &gt;$1B in ARR)
Chris Worsey (@chris_worsey) 's Twitter Profile Photo

I took the Andrej Karpathy autoresearch loop and pointed it at markets. 25 AI agents debate macro, rates, commodities, sectors, and single stocks daily. Every recommendation scored against real outcomes. Worst agent by rolling Sharpe gets its prompt rewritten by the system. Keep or

Vivi (@vivilinsv) 's Twitter Profile Photo

He’s not a developer. He’s an electrician from Kentucky. And he built an AI startup with Claude (Anthropic). This is the first story in my Claude Builder Spotlight series — highlighting real builders using AI to turn expertise into products. Meet Jason Walls Jason Walls

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

We trained a new flood forecasting model designed to predict flash floods in urban areas up to 24 hours in advance. To help address a flash floods data gap, we created Groundsource: a new AI methodology using Gemini to identify 2.6M+ historical events across 150+ countries.

We trained a new flood forecasting model designed to predict flash floods in urban areas up to 24 hours in advance.

To help address a flash floods data gap, we created Groundsource: a new AI methodology using Gemini to identify 2.6M+ historical events across 150+ countries.
Philip Smith 🇨🇦🇺🇦 (@philsmith26) 's Twitter Profile Photo

Claude is amazing. Here's an app it created for me in no time at all that provides information and charts for any Statistics Canada time series vector number. Give it a go. #cdnecon philipsmith.ca/statcan/statca…