Seth Lazar (@sethlazar) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

My fave recent example of this. O3 hallucinates, I ask it to search to double check, I enable search, and it acts as tho search is disabled. I guess it’s reasoning that if it searches it’ll discover that it hallucinated and so receive negative reward.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Kevin Roose

@kevinroose

2 months ago

There is a strain of AI skepticism that is rooted in pretending like it’s still 2021 and nobody can actually use this stuff for themselves. It has survived for longer than I would have guessed!

thumb_up_off_alt1,1K

chat_bubble_outline93

repeat81

shareShare

Chubby♨️

@kimmonismus

2 months ago

I don't know what's funnier: that people actually watched the entire 60 minutes and analyzed every second to discover something like that, or the fact that Figure.02 makes packages disappear.

thumb_up_off_alt647

chat_bubble_outline44

repeat38

shareShare

Arvind Narayanan

@random_walker

2 months ago

The origin story of “AI as Normal Technology”, and lessons learned Many people have asked how the “AI as Normal Technology” paper came to be. This paper has been an (ongoing) journey for me and Sayash Kapoor in developing not just the substance of our arguments but also learning how

thumb_up_off_alt130

chat_bubble_outline12

repeat20

shareShare

Raphaël Millière

@raphaelmilliere

2 months ago

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13

thumb_up_off_alt102

chat_bubble_outline3

repeat23

shareShare

Julian Michael

@_julianmichael_

2 months ago

Read our new position paper on making red teaming research relevant for real systems 👇

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Saffron Huang

@saffronhuang

2 months ago

Newest ⚡ reboot ⚡ 🎙️post: jessica dai and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys) Why we should view the future as constructed, not predicted

Newest <a href="/reboot_hq/">⚡ reboot ⚡</a> 🎙️post: <a href="/jessicadai_/">jessica dai</a> and I discuss forecasting, and how people present unhelpful narratives about the future (mostly by picking on AI 2027, sorry guys)

Why we should view the future as constructed, not predicted

thumb_up_off_alt53

chat_bubble_outline3

repeat9

shareShare

Nathan Lambert

@natolambert

2 months ago

I'm happy to sell 49% of interconnects for the low price of $500M. I'll work for you too. May be a steal relative to other deals on the market.

thumb_up_off_alt158

chat_bubble_outline8

repeat11

shareShare

Samuel Hammond 🌐🏛

@hamandcheese

2 months ago

Is Claude self-conscious? I claim humans evolved self-consciousness for normative score keeping. This is why language, higher agency, and complex morality all emerged simultaneously in human evolution. They are different sides of our capacity to attribute normative statuses and

thumb_up_off_alt31

chat_bubble_outline4

repeat4

shareShare

Seth Lazar

@sethlazar

2 months ago

This is a very interesting perspective.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Gillian Hadfield

@ghadfield

2 months ago

Six years ago Jack Clark and I proposed regulatory markets as a new model for AI governance that would attract more investment---money and brains—in a democratically legitimate way, fostering AI innovation while ensuring these powerful technologies don’t destabilize or harm

thumb_up_off_alt69

chat_bubble_outline5

repeat13

shareShare

Jesse Hoogland

@jesse_hoogland

2 months ago

Excellent. Here’s my AI safety blueprint: - 5am: Wake up. Get 10min of direct monitor light while checking last night’s experiments. - 6am: Head to the gym to train some SAEs. - 7am: Red-light therapy while I red-team some model organisms of misalignment. - 8am: Spend the rest

thumb_up_off_alt131

chat_bubble_outline4

repeat6

shareShare

Diyi Yang

@diyi_yang

2 months ago

AI agents are transforming the workforce, but workers’ voices are often missing! Where do they want AI help? Which human skills will matter more? We mapped how AI agents could #automate vs. #augment jobs across the U.S. workforce with a worker-first look of the future of work!

thumb_up_off_alt82

chat_bubble_outline2

repeat13

shareShare

Nathan Lambert

@natolambert

2 months ago

Too many are being sanctimonious about human intelligence in face of the first real thinking machines. They'll be left behind like many who failed to understand technology in the past.

thumb_up_off_alt114

chat_bubble_outline9

repeat25

shareShare

Ed Turner

@edturner42

2 months ago

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

thumb_up_off_alt226

chat_bubble_outline15

repeat42

shareShare

Gillian Hadfield

@ghadfield

2 months ago

My lab Johns Hopkins University is recruiting research and communications professionals, and AI postdocs to advance our work ensuring that AI is safe and aligned to human well-being worldwide: We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy

thumb_up_off_alt113

chat_bubble_outline3

repeat39

shareShare

Atoosa Kasirzadeh

@dr_atoosa

2 months ago

I was planning to launch my substack on "Human, life, AI, and future" in a few months, with something very different. I’ve been working quietly on some exciting research about AI and the future of humanity—big questions, long arcs, and some surprising ideas I was excited to share

thumb_up_off_alt27

chat_bubble_outline0

repeat6

shareShare

David Duvenaud

@davidduvenaud

2 months ago

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop! Post-AGI Civilizational Equilibria: Are there any good ones? Vancouver, July 14th Featuring: Joe Carlsmith Richard Ngo Emmett Shear 🧵

thumb_up_off_alt80

chat_bubble_outline7

repeat8

shareShare

Dawn Song

@dawnsongtweets

2 months ago

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖

thumb_up_off_alt333

chat_bubble_outline13

repeat108

shareShare

Seth Lazar

@sethlazar

2 months ago

Context rot is the bane of my (LLM-using) life

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare