Samuel Marks (@saprmarks) 's Twitter Profile
Samuel Marks

@saprmarks

AI safety research @AnthropicAI. Prev postdoc in LLM interpretability with @davidbau, math PhD at @Harvard, director of technical programs at haist.ai

ID: 1712234210109587456

calendar_today11-10-2023 22:30:42

382 Tweet

1,1K Takipçi

110 Takip Edilen

Samuel Marks (@saprmarks) 's Twitter Profile Photo

What's an RL algorithms researcher's job? To make reward go up. What's an alignment auditing researcher's job? To uh.. check if models are aligned? Recent work unlocks a potential answer to this question: To build tools that make auditing agent win rate go up. New blog post.

What's an RL algorithms researcher's job? To make reward go up.
What's an alignment auditing researcher's job? To uh.. check if models are aligned?

Recent work unlocks a potential answer to this question: To build tools that make auditing agent win rate go up.

New blog post.