Samuel Marks (@saprmarks) 's Twitter Profile
Samuel Marks

@saprmarks

AI safety research @AnthropicAI. Prev postdoc in LLM interpretability with @davidbau, math PhD at @Harvard, director of technical programs at haist.ai

ID: 1712234210109587456

calendar_today11-10-2023 22:30:42

382 Tweet

1,1K Followers

110 Following

Samuel Marks (@saprmarks) 's Twitter Profile Photo

New paper with Johannes Treutlein , Evan Hubinger , and many other coauthors! We train a model with a hidden misaligned objective and use it to run an auditing game: Can other teams of researchers uncover the model’s objective? x.com/AnthropicAI/st…