
Emmanuel Ameisen
@mlpowered
Interpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
ID: 878315447048839168
https://mlpowered.com/book/ 23-06-2017 18:14:55
2,2K Tweet
8,8K Followers
225 Following


swyx Emmanuel Ameisen Anthropic Andon Labs All watched over by vending machines of loving grace






In which the gang (Runjin Chen, Andy Arditi, Jack Lindsey ): - identifies vectors for bad personas (evil, sycophancy, hallucinations, etc) - shows that if you inject the bad vectors in training, the model learns to not do the bad thing!! aka vaccines but for LLMs

New research with coauthors at Paul Jankura, Google DeepMind, EleutherAI, and Decode Research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)


