Seth Lazar (@sethlazar) 's Twitter Profile
Seth Lazar

@sethlazar

ANU Philosophy Prof working on normative philosophy of computing and sociotechnical AI safety.

ID: 351808995

linkhttps://sethlazar.org calendar_today09-08-2011 19:21:17

2,2K Tweet

6,6K Takipçi

2,2K Takip Edilen

Seth Lazar (@sethlazar) 's Twitter Profile Photo

My fave recent example of this. O3 hallucinates, I ask it to search to double check, I enable search, and it acts as tho search is disabled. I guess it’s reasoning that if it searches it’ll discover that it hallucinated and so receive negative reward.

Kevin Roose (@kevinroose) 's Twitter Profile Photo

There is a strain of AI skepticism that is rooted in pretending like it’s still 2021 and nobody can actually use this stuff for themselves. It has survived for longer than I would have guessed!

Chubby♨️ (@kimmonismus) 's Twitter Profile Photo

I don't know what's funnier: that people actually watched the entire 60 minutes and analyzed every second to discover something like that, or the fact that Figure.02 makes packages disappear.

Arvind Narayanan (@random_walker) 's Twitter Profile Photo

The origin story of “AI as Normal Technology”, and lessons learned Many people have asked how the “AI as Normal Technology” paper came to be. This paper has been an (ongoing) journey for me and Sayash Kapoor in developing not just the substance of our arguments but also learning how

Raphaël Millière (@raphaelmilliere) 's Twitter Profile Photo

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13
Saffron Huang (@saffronhuang) 's Twitter Profile Photo

Newest ⚡ reboot ⚡ 🎙️post: jessica dai and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys) Why we should view the future as constructed, not predicted

Newest <a href="/reboot_hq/">⚡ reboot ⚡</a> 🎙️post: <a href="/jessicadai_/">jessica dai</a> and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys)

Why we should view the future as constructed, not predicted
Nathan Lambert (@natolambert) 's Twitter Profile Photo

I'm happy to sell 49% of interconnects for the low price of $500M. I'll work for you too. May be a steal relative to other deals on the market.

Samuel Hammond 🌐🏛 (@hamandcheese) 's Twitter Profile Photo

Is Claude self-conscious? I claim humans evolved self-consciousness for normative score keeping. This is why language, higher agency, and complex morality all emerged simultaneously in human evolution. They are different sides of our capacity to attribute normative statuses and

Gillian Hadfield (@ghadfield) 's Twitter Profile Photo

Six years ago Jack Clark and I proposed regulatory markets as a new model for AI governance that would attract more investment---money and brains—in a democratically legitimate way, fostering AI innovation while ensuring these powerful technologies don’t destabilize or harm

Jesse Hoogland (@jesse_hoogland) 's Twitter Profile Photo

Excellent. Here’s my AI safety blueprint: - 5am: Wake up. Get 10min of direct monitor light while checking last night’s experiments. - 6am: Head to the gym to train some SAEs. - 7am: Red-light therapy while I red-team some model organisms of misalignment. - 8am: Spend the rest

Diyi Yang (@diyi_yang) 's Twitter Profile Photo

AI agents are transforming the workforce, but workers’ voices are often missing! Where do they want AI help? Which human skills will matter more? We mapped how AI agents could #automate vs. #augment jobs across the U.S. workforce with a worker-first look of the future of work!

Nathan Lambert (@natolambert) 's Twitter Profile Photo

Too many are being sanctimonious about human intelligence in face of the first real thinking machines. They'll be left behind like many who failed to understand technology in the past.

Ed Turner (@edturner42) 's Twitter Profile Photo

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?!

We're releasing two papers exploring why! We:
- Open source small clean EM models
- Show EM is driven by a single evil vector
- Show EM has a mechanistic phase transition
Gillian Hadfield (@ghadfield) 's Twitter Profile Photo

My lab Johns Hopkins University is recruiting research and communications professionals, and AI postdocs to advance our work ensuring that AI is safe and aligned to human well-being worldwide: We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy

Atoosa Kasirzadeh (@dr_atoosa) 's Twitter Profile Photo

I was planning to launch my substack on "Human, life, AI, and future" in a few months, with something very different. I’ve been working quietly on some exciting research about AI and the future of humanity—big questions, long arcs, and some surprising ideas I was excited to share

I was planning to launch my substack on "Human, life, AI, and future" in a few months, with something very different. I’ve been working quietly on some exciting research about AI and the future of humanity—big questions, long arcs, and some surprising ideas I was excited to share
David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop! Post-AGI Civilizational Equilibria: Are there any good ones? Vancouver, July 14th Featuring: Joe Carlsmith Richard Ngo Emmett Shear 🧵

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good.  So we’re hosting a workshop!

Post-AGI Civilizational Equilibria: Are there any good ones?

Vancouver, July 14th

Featuring: <a href="/jkcarlsmith/">Joe Carlsmith</a> <a href="/RichardMCNgo/">Richard Ngo</a> <a href="/eshear/">Emmett Shear</a> 🧵
Dawn Song (@dawnsongtweets) 's Twitter Profile Photo

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity.
 In our latest work:

 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects

 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars
🤖