Guy Davidson (@guyd33) Twitter Tweets • TwiCopy

Guy Davidson

@guyd33

+ Follow

PhD @NYUDataScience, visiting researcher @AIatMeta, interested in AI & CogSci, specifically in goals and their representations in minds and machines (he/him).

ID: 1117859056817823745

linkhttps://guydavidson.me calendar_today15-04-2019 18:35:42

925 Tweet

968 Followers

1,1K Following

Guy Davidson

@guyd33

6 months ago

Fantastic new work by John (Yueh-Han) Chen (with Brenden Lake and me trying not to cause too much trouble). We study systematic generalization in a safety setting and find LLMs struggle to consistently respond safely when we vary how we ask naive questions. More fun analyses in the paper!

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Sonia

@soniajoseph_

6 months ago

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

thumb_up_off_alt288

chat_bubble_outline3

repeat31

shareShare

Guy Davidson

@guyd33

6 months ago

You (yes, you!) should work with Sydney! Either short-term this summer, or longer term at her nascent lab at NYU!

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Guy Davidson

@guyd33

5 months ago

Today! Come hear from some wonderful folks about problem solving and design at 1 PM PT / 4 PM ET / 8 PM UTC

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Dr. Karen Ullrich

@karen_ullrich

5 months ago

How would you make an LLM "forget" the concept of dog — or any other arbitrary concept? 🐶❓ We introduce SAMD & SAMI — a novel, concept-agnostic approach to identify and manipulate attention modules in transformers.

thumb_up_off_alt77

chat_bubble_outline3

repeat12

shareShare

Guy Davidson

@guyd33

5 months ago

Cool new work on localizing and removing concepts using attention heads from colleagues at NYU and Meta!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Guy Davidson

@guyd33

4 months ago

John has some nice new results showing that some frontier models do worse on our safety benchmark than their predecessors. Take a look!

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Guy Davidson

@guyd33

4 months ago

We've been using smile to develop behavioral web experiments in the lab for the last year+. Everything from the simplest survey-like judgment collections to complex game-like designs (e.g., exps.gureckislab.org/e/laugh-melted…) is easier to develop and deploy. Consider it for your next exp!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare