@saprmarks : What's an RL algorithms researcher's job? To make reward go up. What's an alignment auditing researcher's job? To uh.. check if models are aligned? Recent work unlocks a potential answer to this question: To build tools that make auditing agent win rate go up. New blog post. • TwiCopy

Samuel Marks

@saprmarks

+ Follow

AI safety research @AnthropicAI. Prev postdoc in LLM interpretability with @davidbau, math PhD at @Harvard, director of technical programs at haist.ai

ID: 1712234210109587456

calendar_today11-10-2023 22:30:42

382 Tweet

1,1K Takipçi

110 Takip Edilen

Samuel Marks

@saprmarks

a month ago

What's an RL algorithms researcher's job? To make reward go up. What's an alignment auditing researcher's job? To uh.. check if models are aligned? Recent work unlocks a potential answer to this question: To build tools that make auditing agent win rate go up. New blog post.

thumb_up_off_alt168

chat_bubble_outline2

repeat9

shareShare