Micah Carroll (@micahcarroll) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Should we use LLMs 🤖 to simulate human research subjects 🧑? In our new preprint, we argue sims can augment human studies to scale up social science as AI technology accelerates. We identify five tractable challenges and argue this is a promising and underused research method 🧵

thumb_up_off_alt320

chat_bubble_outline22

repeat68

shareShare

Cassidy Laidlaw

@cassidy_laidlaw

3 months ago

We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline90

repeat217

shareShare

Tom Everitt

@tom4everitt

3 months ago

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

thumb_up_off_alt237

chat_bubble_outline23

repeat44

shareShare

Joe Edelman

@edelwax

2 months ago

Long-term user satisfaction is *also* not the right metric. It will correlate with addiction. You gotta measure flourishing, according to the user's values, in areas where they collab w. ChatGPT.

thumb_up_off_alt26

chat_bubble_outline4

repeat1

shareShare

Thomas Kleine Buening

@thomasklbg

2 months ago

The 2nd Workshop on Models of Human Feedback for AI Alignment will take place at ICML Conference 2025 on 18/19 July in Vancouver! Submit here: openreview.net/group?id=ICML.… 📅Deadline: May 25th, 2025 (AoE) 🔗More Info: sites.google.com/view/mhf-icml2… Hope to see you in Vancouver!

The 2nd Workshop on Models of Human Feedback for AI Alignment will take place at <a href="/icmlconf/">ICML Conference</a> 2025 on 18/19 July in Vancouver!

Submit here: openreview.net/group?id=ICML.…
📅Deadline: May 25th, 2025 (AoE)
🔗More Info: sites.google.com/view/mhf-icml2…

Hope to see you in Vancouver!

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Michaël Trazzi

@michaeltrazzi

2 months ago

"SB-1047: The Battle For The Future of AI" Full Documentary uncovering what really happened behind the scenes of the SB-1047 debate, now available on X This project is the culmination of 8 months of work, 20+ interviews, and is probably the best video I've ever made. Enjoy!

thumb_up_off_alt314

chat_bubble_outline24

repeat71

shareShare

Miles Brundage

@miles_brundage

2 months ago

Isn't this the second time they've blamed a rogue employee for changing the prompt? x.com/xai/status/192…

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat46

shareShare

Adam Gleave

@argleave

a month ago

My colleague Ian McKenzie spent six hours red-teaming Claude 4 Opus, and easily bypassed safeguards designed to block WMD development. Claude gave >15 pages of non-redundant instructions for sarin gas, describing all key steps in the manufacturing process.

My colleague <a href="/irobotmckenzie/">Ian McKenzie</a> spent six hours red-teaming Claude 4 Opus, and easily bypassed safeguards designed to block WMD development. Claude gave >15 pages of non-redundant instructions for sarin gas, describing all key steps in the manufacturing process.

thumb_up_off_alt856

chat_bubble_outline84

repeat138

shareShare

Hannah Rose Kirk

@hannahrosekirk

a month ago

Why do human–AI relationships need socioaffective alignment? As AI evolves from tools to companions, we must seek systems that enhance rather than exploit our nature as social & emotional beings. Published today in nature Humanities & Social Sciences! nature.com/articles/s4159…

thumb_up_off_alt275

chat_bubble_outline6

repeat54

shareShare

Robert Kirk

@_robertkirk

a month ago

New paper! With Joshua Clymer, Jonah Weinbaum and others, we’ve written a safety case for safeguards against misuse. We lay out how developers can connect safeguard evaluation results to real-world decisions about how to deploy models. 🧵

New paper! With <a href="/joshua_clymer/">Joshua Clymer</a>, Jonah Weinbaum and others, we’ve written a safety case for safeguards against misuse. We lay out how developers can connect safeguard evaluation results to real-world decisions about how to deploy models. 🧵

thumb_up_off_alt45

chat_bubble_outline4

repeat9

shareShare

David Duvenaud

@davidduvenaud

a month ago

What to do about gradual disempowerment? We laid out a research agenda with all the concrete and feasible research projects we can think of: 🧵 with Raymond Douglas Jan Kulveit David Krueger

thumb_up_off_alt191

chat_bubble_outline5

repeat38

shareShare

Nitasha Tiku

@nitashatiku

a month ago

AI is speedrunning the social media era by optimizing chatbots for engagement, user feedback, + time spent. Evidence is mounting that this poses unintended risks, including chats from peer-reviewed research, OpenAI's "sycophancy" debacle, & Character ai lawsuits

thumb_up_off_alt39

chat_bubble_outline1

repeat10

shareShare

Micah Carroll

@micahcarroll

a month ago

LLMs' sycophancy issues are a predictable result of optimizing for user feedback. Even if clear sycophantic behaviors get fixed, AIs' exploits of our cognitive biases may only become more subtle. Grateful our research on this was featured by Nitasha Tiku & The Washington Post!

thumb_up_off_alt65

chat_bubble_outline1

repeat18

shareShare

Neel Nanda

@neelnanda5

a month ago

I've been really feeling how much the general public is concerned about AI risk... In a *weird* amount of recent interactions with normal people (eg my hairdresser) when I say I do AI research (*not* safety), they ask if AI will take over Alas, I have no reassurances to offer

thumb_up_off_alt485

chat_bubble_outline43

repeat17

shareShare

Hannah Rose Kirk

@hannahrosekirk

a month ago

A great The Washington Post story to be quoted in. I spoke to Nitasha Tiku re our work on human-AI relationships as well as early results from our University of Oxford survey of 2k UK citizens showing ~30% have sought AI companionship, emotional support or social interaction in the past year

thumb_up_off_alt69

chat_bubble_outline2

repeat12

shareShare

Iason Gabriel

@iasongabriel

a month ago

1. How can we remain healthy and free while engaging in extended personal interaction with AI agents that shape our behaviour and preferences? One answer is "socioaffective alignment" as discussed in our new paper nature Humanities & Social Sciences! nature.com/articles/s4159…

thumb_up_off_alt67

chat_bubble_outline7

repeat21

shareShare

METR

@metr_evals

a month ago

We already find it hard to understand what the model is doing and whether a high score is due to a clever optimization or a brittle hack. As models get more capable, it will become increasingly difficult to determine what is reward hacking and what is intended behavior.

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Micah Carroll

Gate.io

Jacy Reese Anthis

Cassidy Laidlaw

Tom Everitt

Joe Edelman

Thomas Kleine Buening

Michaël Trazzi

Miles Brundage

Adam Gleave

Hannah Rose Kirk

Robert Kirk

David Duvenaud

Nitasha Tiku

Micah Carroll

Neel Nanda

Hannah Rose Kirk

Iason Gabriel

METR