
Pranav Ramesh
@pranavramesh25
@harvard @neo @zfellows | prev. @coframe_ai, @tryramp
ID: 2353713912
https://www.pranavramesh.com/ 20-02-2014 19:54:24
124 Tweet
472 Takipçi
305 Takip Edilen










New blog post with Armaan Tipirneni! Following Emergent Misalignment, we show that finetuning even a single layer via LoRA on insecure code can induce toxic outputs in Qwen2.5-Coder-32B-Instruct, and that you can extract steering vectors to make the base model similarly misaligned 🧵







