Bowen Baker (@bobabowen) Twitter Tweets • TwiCopy

Bowen Baker

@bobabowen

+ Follow

Research Scientist at @openai since 2017
Robotics, Multi-Agent Reinforcement Learning, LM Reasoning, and now Alignment.

ID: 817950453849845760

linkhttps://bowenbaker.github.io/ calendar_today08-01-2017 04:25:59

19 Tweet

1,1K Followers

104 Following

Bowen Baker

@bobabowen

8 months ago

Excited to share what my team has been working on at OpenAI!

thumb_up_off_alt219

chat_bubble_outline9

repeat11

shareShare

One direction I'm excited to see more work on in the future is CoT monitoring as a potential scalable oversight method. In our work, we found that we could monitor a strong reasoning model (same class as o1 or o3-mini) with a weaker model (gpt-4o).

thumb_up_off_alt24

chat_bubble_outline1

repeat3

shareShare

Ryan Greenblatt

@ryanpgreenblatt

8 months ago

IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵

thumb_up_off_alt157

chat_bubble_outline5

repeat16

shareShare

Bowen Baker

@bobabowen

8 months ago

Worth a read and adding into your bank of potential futures.

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Bowen Baker

@bobabowen

7 months ago

eep

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Mikita Balesni 🇺🇦

@balesni

4 months ago

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

thumb_up_off_alt403

chat_bubble_outline26

repeat98

shareShare

Max Zeff

@zeffmax

4 months ago

New: Researchers from OpenAI, DeepMind, and Anthropic are calling for an industry-wide push to evaluate, preserve, and improve the "thoughts" externalized by AI reasoning models. I spoke with Bowen Baker, who was involved in the position paper, for TechCrunch.

thumb_up_off_alt20

chat_bubble_outline2

repeat2

shareShare

Rohin Shah

@rohinmshah

4 months ago

Chain of thought monitoring looks valuable enough that we’ve put it in our Frontier Safety Framework to address deceptive alignment. This paper is a good explanation of why we’re optimistic – but also why it may be fragile, and what to do to preserve it. x.com/balesni/status…

thumb_up_off_alt71

chat_bubble_outline1

repeat6

shareShare

Tomek Korbak

@tomekkorbak

4 months ago

The holy grail of AI safety has always been interpretability. But what if reasoning models just handed it to us in a stroke of serendipity? In our new paper, we argue that the AI community should turn this serendipity into a systematic AI safety agenda!🛡️

thumb_up_off_alt94

chat_bubble_outline6

repeat13

shareShare

Jakub Pachocki

@merettm

4 months ago

I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview. As AI systems spend more compute working e.g. on long term research problems, it is

thumb_up_off_alt404

chat_bubble_outline23

repeat66

shareShare

Bowen Baker

@bobabowen

4 months ago

Don't think about a pink elephant

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare