Bowen Baker (@bobabowen) 's Twitter Profile
Bowen Baker

@bobabowen

Research Scientist at @openai since 2017
Robotics, Multi-Agent Reinforcement Learning, LM Reasoning, and now Alignment.

ID: 817950453849845760

linkhttps://bowenbaker.github.io/ calendar_today08-01-2017 04:25:59

19 Tweet

1,1K Takipçi

104 Takip Edilen

Bowen Baker (@bobabowen) 's Twitter Profile Photo

One direction I'm excited to see more work on in the future is CoT monitoring as a potential scalable oversight method. In our work, we found that we could monitor a strong reasoning model (same class as o1 or o3-mini) with a weaker model (gpt-4o).

One direction I'm excited to see more work on in the future is CoT monitoring as a potential scalable oversight method. In our work, we found that we could monitor a strong reasoning model (same class as o1 or o3-mini) with a weaker model (gpt-4o).
Ryan Greenblatt (@ryanpgreenblatt) 's Twitter Profile Photo

IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵

Mikita Balesni 🇺🇦 (@balesni) 's Twitter Profile Photo

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

A simple AGI safety technique: AI’s thoughts are in plain English, just read them

We know it works, with OK (not perfect) transparency!

The risk is fragility: RL training, new architectures, etc threaten transparency

Experts from many orgs agree we should try to preserve it:
Max Zeff (@zeffmax) 's Twitter Profile Photo

New: Researchers from OpenAI, DeepMind, and Anthropic are calling for an industry-wide push to evaluate, preserve, and improve the "thoughts" externalized by AI reasoning models. I spoke with Bowen Baker, who was involved in the position paper, for TechCrunch.

New: Researchers from OpenAI, DeepMind, and Anthropic are calling for an industry-wide push to evaluate, preserve, and improve the "thoughts" externalized by AI reasoning models.

I spoke with <a href="/bobabowen/">Bowen Baker</a>, who was involved in the position paper, for TechCrunch.
Rohin Shah (@rohinmshah) 's Twitter Profile Photo

Chain of thought monitoring looks valuable enough that we’ve put it in our Frontier Safety Framework to address deceptive alignment. This paper is a good explanation of why we’re optimistic – but also why it may be fragile, and what to do to preserve it. x.com/balesni/status…

Tomek Korbak (@tomekkorbak) 's Twitter Profile Photo

The holy grail of AI safety has always been interpretability. But what if reasoning models just handed it to us in a stroke of serendipity? In our new paper, we argue that the AI community should turn this serendipity into a systematic AI safety agenda!🛡️

Jakub Pachocki (@merettm) 's Twitter Profile Photo

I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview. As AI systems spend more compute working e.g. on long term research problems, it is