
Matthew Kowal
@matthewkowal9
Research Resident @FARAIResearch / PhD @YorkUniversity @VectorInst / Previously @UbisoftLaForge @ToyotaResearch @_NextAI / AI Safety + Interpretability
ID: 1107409539635257347
http://mkowal2.github.io/ 17-03-2019 22:33:03
767 Tweet
407 Followers
290 Following












Accepted at #ICML2025! Check out the preprint. Shoutout to the group for an AMAZING research journey Harry Thasarathan Julian Thomas Fel Matthew Kowal This is Harry’s first PhD paper (first year, great start) and Julian’s first ever paper (work done as an undergrad 💪).



My colleague Ian McKenzie spent six hours red-teaming Claude 4 Opus, and easily bypassed safeguards designed to block WMD development. Claude gave >15 pages of non-redundant instructions for sarin gas, describing all key steps in the manufacturing process.



Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025


