Guy Davidson (@guyd33) 's Twitter Profile
Guy Davidson

@guyd33

PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).

ID: 1117859056817823745

linkhttps://guydavidson.me calendar_today15-04-2019 18:35:42

925 Tweet

968 Followers

1,1K Following

Guy Davidson (@guyd33) 's Twitter Profile Photo

Fantastic new work by John (Yueh-Han) Chen (with Brenden Lake and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!

Sonia (@soniajoseph_) 's Twitter Profile Photo

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! πŸŽ‰ We’ll be in Nashville next week. Come say hi πŸ‘‹ #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! πŸŽ‰

We’ll be in Nashville next week. Come say hi πŸ‘‹

<a href="/CVPR/">#CVPR2025</a>  <a href="/miv_cvpr2025/">Mechanistic Interpretability for Vision @ CVPR2025</a>
Dr. Karen Ullrich (@karen_ullrich) 's Twitter Profile Photo

How would you make an LLM "forget" the concept of dog β€” or any other arbitrary concept? πŸΆβ“ We introduce SAMD & SAMI β€” a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.

How would you make an LLM "forget" the concept of dog β€” or any other arbitrary concept? πŸΆβ“

We introduce SAMD &amp; SAMI β€” a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.
Guy Davidson (@guyd33) 's Twitter Profile Photo

John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!

Guy Davidson (@guyd33) 's Twitter Profile Photo

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!