Daniel Eth (yes, Eth is my actual last name) (@daniel_271828) 's Twitter Profile
Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

Researching effects of automated AI R&D | pro-America, pro-tech, & pro-AI safety

ID: 766076479222456320

linkhttps://medium.com/@daniel_eth calendar_today18-08-2016 00:57:20

37,37K Tweet

8,8K Followers

905 Following

Ethan Mollick (@emollick) 's Twitter Profile Photo

The X discussion about the Claude 4 system card is getting counterproductive It punishes Anthropic for actually releasing full safety tests and admitting to unusual behaviors. And I bet the behaviors of other models are really similar to Claude & now more labs will hide results.

The X discussion about the Claude 4 system card is getting counterproductive

It punishes Anthropic for actually releasing full safety tests and admitting to unusual behaviors. And I bet the behaviors of other models are really similar to Claude & now more labs will hide results.
James Campbell (@jam3scampbell) 's Twitter Profile Photo

what's new is that we have real agents that can operate computers in an open-ended way. and that open-ended action space is gonna include a ton of real actions like "call_police" that you weren't thinking of when you decided to fuck around and put the model in an extreme scenario

Palisade Research (@palisadeai) 's Twitter Profile Photo

🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.

Palisade Research (@palisadeai) 's Twitter Profile Photo

⚠️ But as far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.

Rafael Ruiz ⏸️🔸 (@rafaruizdelira) 's Twitter Profile Photo

To be fair, it's pretty ridiculous to me that people could build houses, churches and castles for centuries and they hadn't invented the Cartesian plane. It's an invention that feels like the Egyptians or Ancient Greeks could have come up with it. Low hanging fruit.

Dean W. Ball (@deanwball) 's Twitter Profile Photo

“…with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention.”

AI Notkilleveryoneism Memes ⏸️ (@aisafetymemes) 's Twitter Profile Photo

Elon Musk Elon I know the world has a lot of problems and you want to fix them all but this is IT. This is the one. You, more than anybody, know what's at stake. You know how little time is left. You know there are no adults in the room. Humanity needs you to focus!