Emmanuel Ameisen
@mlpowered
Interpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
ID: 878315447048839168
https://mlpowered.com/book/ 23-06-2017 18:14:55
2,2K Tweet
8,8K Followers
225 Following
swyx Emmanuel Ameisen Anthropic Andon Labs All watched over by vending machines of loving grace
In which the gang (Runjin Chen, Andy Arditi, Jack Lindsey ): - identifies vectors for bad personas (evil, sycophancy, hallucinations, etc) - shows that if you inject the bad vectors in training, the model learns to not do the bad thing!! aka vaccines but for LLMs
New research with coauthors at Paul Jankura, Google DeepMind, EleutherAI, and Decode Research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)