Chris Painter (@chrispainteryup) 's Twitter Profile
Chris Painter

@chrispainteryup

policy director @METR_Evals, evals accelerationist. working hard on responsible scaling policies

ID: 930159122535866369

calendar_today13-11-2017 19:43:11

1,1K Tweet

1,1K Takipçi

1,1K Takip Edilen

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

Modern indie videogame genres ranked by how interesting I expect them to be as AI agent evaluation environments (assuming no spoilers): A: Roguelike, Factory Builder, Colony Simulator, Survivors-like B: Metroidvania, Metroidbrania F: Soulslike, Extraction Shooter, Battle Royale

Dan Nystedt (@dnystedt) 's Twitter Profile Photo

Synopsys, famous for chip design engineering software (EDA), said it has suspended financial guidance for the 3rd quarter and full-year fiscal 2025 after receiving a letter from the US government’s Bureau of Industry and Security (BIS) related to new export restrictions on China,

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

The San Francisco Bay Area has been so wonderful to explore the last 6ish years, and I’m so grateful to get to be experiencing my life in such a beautiful and fun place. (Not moving, just appreciating)

The San Francisco Bay Area has been so wonderful to explore the last 6ish years, and I’m so grateful to get to be experiencing my life in such a beautiful and fun place.

(Not moving, just appreciating)
Chris Painter (@chrispainteryup) 's Twitter Profile Photo

First time I can remember Dwarkesh supporting specific policies: - Tentative support for 10 year block on state AI legislation - Streamline datacenter construction - Expand energy capacity - Reform liability to limit liability exposure for AI systems - Broad deregulation

First time I can remember Dwarkesh supporting specific policies:
- Tentative support for 10 year block on state AI legislation
- Streamline datacenter construction
- Expand energy capacity
- Reform liability to limit liability exposure for AI systems
- Broad deregulation
Chris Painter (@chrispainteryup) 's Twitter Profile Photo

When someone says “I’m not at all confident of X” I sometimes want to say “Cool, just to check, you understand this means if X happens, when we’re tallying up correctness points later on, people who say ‘I’m somewhat confident X will happen’ will get more points than you, right?”

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

I’m enjoying SemiAnalysis doing more of these explanatory Twitter threads. They’re interesting. I don’t remember them doing as many of these 2+ months ago

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

The last couple years I’ve begun thinking of truth as a kind of social consensus Schelling point that intelligent people rely on because they know social structures built around true arguments/ideas/logic will be undeniably persuasive or compelling to other intelligent people.

METR (@metr_evals) 's Twitter Profile Photo

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.

At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.
Megan Kinniment (@mkinniment) 's Twitter Profile Photo

AI agent performance on HCAST & RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks. The best humans, on the other hand, seem to have less obvious plateaus. Some thoughts on this🧵

AI agent performance on HCAST & RE-Bench seems to ‘plateau’ as agents are given more ‘time’ to do tasks.

The best humans, on the other hand, seem to have less obvious plateaus.

Some thoughts on this🧵
Kyle Chan (@kyleichan) 's Twitter Profile Photo

Why try to smuggle Nvidia chips into China when you can just smuggle training data out? Incredible WSJ report: wsj.com/tech/china-ai-… Liza Lin Raffaele Huang

Why try to smuggle Nvidia chips into China when you can just smuggle training data out?

Incredible WSJ report: wsj.com/tech/china-ai-…
<a href="/lizalinwsj/">Liza Lin</a> <a href="/raffaelehuang/">Raffaele Huang</a>
Joel Becker (@joel_bkr) 's Twitter Profile Photo

delighted to announce that RE-Benchwarmers, the METR et al soccer team, achieved a 1-3 defeat last week. extrapolating out, we see that we can expect a positive goal difference starting next season, after the break

delighted to announce that RE-Benchwarmers, the METR et al soccer team, achieved a 1-3 defeat last week.  

extrapolating out, we see that we can expect a positive goal difference starting next season, after the break
Chris Painter (@chrispainteryup) 's Twitter Profile Photo

If there are very convincing arguments that your work is important, and there are good structural reasons to think it won't be adequately compensated by the market, choosing that work, over well-paid alternatives, awards you some "counterfactual moral agency points" in my book

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

I think something like this intuition is why I’m still confused about the Ege Erdil and Matthew Barnett position on “broad labor automation is more important than automated AI R&D” Sorry I’m probably mis-summarizing you guys

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

At this point believing that "Anonymous" is a real, specific group of hacktivists feels like believing in the Tooth Fairy, but for people whose worldview and politics froze in time somewhere around when the Occupy Wall Street protests happened