Ryan Kidd (@ryan_kidd44) 's Twitter Profile
Ryan Kidd

@ryan_kidd44

Co-Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all

ID: 1102399276334759936

calendar_today04-03-2019 02:44:03

1,1K Tweet

1,1K Followers

1,1K Following

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

We're hiring for Research Scientists / Engineers! - We closely work with all frontier labs - We're a small org and can move fast - We can choose our own agenda and what we publish We're especially looking for people who enjoy fast empirical research. Deadline: 31 Oct!

Vaidehi Agarwalla (@vaidehiagrwalla) 's Twitter Profile Photo

we're hiring for several roles at Theorem - reach out if you're exploring! - ml research scientists - ml research engineers - compiler engineers - senior swe's

Wenx (@firebirdwen) 's Twitter Profile Photo

🔔New paper: Can reasoning models hide their reasoning? We stress-tested Chain-of-Thought (CoT) monitoring and found that while monitors detect ~96% of hidden malicious intent under normal conditions, ⚠️detection can collapse to ~10% under strong obfuscation pressure. 🧵

đź””New paper: Can reasoning models hide their reasoning?
We stress-tested Chain-of-Thought (CoT) monitoring and found that while monitors detect ~96% of hidden malicious intent under normal conditions, ⚠️detection can collapse to ~10% under strong obfuscation pressure.  🧵
Scott Alexander (@slatestarcodex) 's Twitter Profile Photo

Sriram Krishnan Ryonan Dean W. Ball Thanks for your interest. I'm not expecting too much danger in the next 18 months, so these would mostly be small updates, but to answer the question: MORE WORRIED: - Anything that looks like shorter timelines, especially superexponential progress on METR time horizons graph or

Stanislav Fort (@stanislavfort) 's Twitter Profile Photo

In 2025, only 4 security vulnerabilities with CVEs were disclosed in OpenSSL = the crypto library securing most of the internet. AISLE Aisle's autonomous AI system discovered 3 out of the 4. And proposed the fixes that remediated them.

In 2025, only 4 security vulnerabilities with CVEs were disclosed in OpenSSL = the crypto library securing most of the internet. 

AISLE <a href="/WeAreAisle/">Aisle</a>'s autonomous AI system discovered 3 out of the 4. And proposed the fixes that remediated them.
AI Impacts (@aiimpacts) 's Twitter Profile Photo

Our surveys’ findings that AI researchers assign a median 5-10% to extinction or similar made a splash (NYT, NBC News, TIME..) But people sometimes underestimate our survey’s methodological quality due to various circulating misconceptions. Today, an FAQ correcting key errors:

Our surveys’ findings that AI researchers assign a median 5-10% to extinction or similar made a splash (NYT, NBC News, TIME..)

But people sometimes underestimate our survey’s methodological quality due to various circulating misconceptions.

Today, an FAQ correcting key errors:
Sharan (@_maiush) 's Twitter Profile Photo

AI that is “forced to be good” v “genuinely good” Should we care about the difference? (yes!) We’re releasing the first open implementation of character training. We shape the persona of AI assistants in a more robust way than alternatives like prompting or activation steering.

AI that is “forced to be good” v “genuinely good”
Should we care about the difference? (yes!)

We’re releasing the first open implementation of character training. We shape the persona of AI assistants in a more robust way than alternatives like prompting or activation steering.
Andy Masley (@andymasley) 's Twitter Profile Photo

Hammered out some thoughts on why I was motivated to post a lot about data centers: the popular conversation about them has been disproportionately informed by very low-trust intuitions I think are bad andymasley.substack.com/p/data-centers…

Hammered out some thoughts on why I was motivated to post a lot about data centers: the popular conversation about them has been disproportionately informed by very low-trust intuitions I think are bad andymasley.substack.com/p/data-centers…
Mike McCormick (@mikemccormick_) 's Twitter Profile Photo

This post is an experiment! I want to fund & help accomplished people and nascent companies + nonprofits working to make AI safe, secure and good for humanity. If you're doing that, or know somebody super credible who is, ping me. And if you're a fan of Halcyon Futures' work,

This post is an experiment!

I want to fund &amp; help accomplished people and nascent companies + nonprofits working to make AI safe, secure and good for humanity.

If you're doing that, or know somebody super credible who is, ping me.

And if you're a fan of <a href="/HalcyonFutures/">Halcyon Futures</a>' work,
Ryan Kidd (@ryan_kidd44) 's Twitter Profile Photo

I wrote a blog post on why I think the AI safety ecosystem undervalues founders and field-builders and what to do about it! lesswrong.com/posts/yw9B5jQa…

Ryan Kidd (@ryan_kidd44) 's Twitter Profile Photo

As a counterpoint to "e/accs", I like the label "AI safers". This is: - Less unwieldy than "AI notkilleveryoneists" - More accurate than "AI doomers" - More inclusive than "EAs" "Safer" also implies that AI can be made more safe by gradation, rather than an absolutist term.