Chris Painter (@chrispainteryup) Twitter Tweets • TwiCopy

Chris Painter

6 months ago

Modern indie videogame genres ranked by how interesting I expect them to be as AI agent evaluation environments (assuming no spoilers): A: Roguelike, Factory Builder, Colony Simulator, Survivors-like B: Metroidvania, Metroidbrania F: Soulslike, Extraction Shooter, Battle Royale

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Dan Nystedt

@dnystedt

6 months ago

Synopsys, famous for chip design engineering software (EDA), said it has suspended financial guidance for the 3rd quarter and full-year fiscal 2025 after receiving a letter from the US government’s Bureau of Industry and Security (BIS) related to new export restrictions on China,

thumb_up_off_alt151

chat_bubble_outline15

repeat23

shareShare

Chris Painter

@chrispainteryup

6 months ago

The San Francisco Bay Area has been so wonderful to explore the last 6ish years, and I’m so grateful to get to be experiencing my life in such a beautiful and fun place. (Not moving, just appreciating)

thumb_up_off_alt45

chat_bubble_outline2

repeat1

shareShare

Chris Painter

@chrispainteryup

6 months ago

First time I can remember Dwarkesh supporting specific policies: - Tentative support for 10 year block on state AI legislation - Streamline datacenter construction - Expand energy capacity - Reform liability to limit liability exposure for AI systems - Broad deregulation

thumb_up_off_alt28

chat_bubble_outline4

repeat0

shareShare

Chris Painter

@chrispainteryup

6 months ago

It’s both inspiring and depressing when a change in personnel/leadership seems to improve things so quickly

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Chris Painter

@chrispainteryup

6 months ago

When someone says “I’m not at all confident of X” I sometimes want to say “Cool, just to check, you understand this means if X happens, when we’re tallying up correctness points later on, people who say ‘I’m somewhat confident X will happen’ will get more points than you, right?”

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Chris Painter

@chrispainteryup

6 months ago

I’m enjoying SemiAnalysis doing more of these explanatory Twitter threads. They’re interesting. I don’t remember them doing as many of these 2+ months ago

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Chris Painter

@chrispainteryup

6 months ago

The last couple years I’ve begun thinking of truth as a kind of social consensus Schelling point that intelligent people rely on because they know social structures built around true arguments/ideas/logic will be undeniably persuasive or compelling to other intelligent people.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

METR

@metr_evals

6 months ago

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.

thumb_up_off_alt221

chat_bubble_outline3

repeat37

shareShare

Chris Painter

@chrispainteryup

6 months ago

Lawrence Chan response to the "Illusion of Thinking" paper!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Megan Kinniment

@mkinniment

6 months ago

AI agent performance on HCAST & RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks. The best humans, on the other hand, seem to have less obvious plateaus. Some thoughts on this🧵

thumb_up_off_alt62

chat_bubble_outline3

repeat7

shareShare

Chris Painter

@chrispainteryup

6 months ago

This feels like the most information dense thing I've read yet about the Israel/Iran situation

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Kyle Chan

@kyleichan

6 months ago

Why try to smuggle Nvidia chips into China when you can just smuggle training data out? Incredible WSJ report: wsj.com/tech/china-ai-… Liza Lin Raffaele Huang

Why try to smuggle Nvidia chips into China when you can just smuggle training data out?

Incredible WSJ report: wsj.com/tech/china-ai-…
<a href="/lizalinwsj/">Liza Lin</a> <a href="/raffaelehuang/">Raffaele Huang</a>

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat319

shareShare

Joel Becker

@joel_bkr

6 months ago

delighted to announce that RE-Benchwarmers, the METR et al soccer team, achieved a 1-3 defeat last week. extrapolating out, we see that we can expect a positive goal difference starting next season, after the break

thumb_up_off_alt99

chat_bubble_outline7

repeat5

shareShare

Chris Painter

@chrispainteryup

6 months ago

If there are very convincing arguments that your work is important, and there are good structural reasons to think it won't be adequately compensated by the market, choosing that work, over well-paid alternatives, awards you some "counterfactual moral agency points" in my book

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Chris Painter

@chrispainteryup

6 months ago

I think something like this intuition is why I’m still confused about the Ege Erdil and Matthew Barnett position on “broad labor automation is more important than automated AI R&D” Sorry I’m probably mis-summarizing you guys

thumb_up_off_alt4

chat_bubble_outline2

repeat0

shareShare

Chris Painter

@chrispainteryup

6 months ago

At this point believing that "Anonymous" is a real, specific group of hacktivists feels like believing in the Tooth Fairy, but for people whose worldview and politics froze in time somewhere around when the Occupy Wall Street protests happened

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare