Michal Golovanevsky (@michalgolov) Twitter Tweets • TwiCopy

Michal Golovanevsky

@michalgolov

+ Follow

CS PhD student @BrownCSDept | Multimodal Learning | Mechanistic Interpretability | Clinical Deep Learning.

ID: 1573399875278049280

linkhttps://github.com/michalg04 calendar_today23-09-2022 19:52:33

22 Tweet

32 Takipçi

42 Takip Edilen

William Rudman

@williamrudmanjr

a year ago

By visualizing cross-attention patterns, we've discovered that these universal heads fall into three functional categories: implicit image segmentation, object inhibition, and outlier inhibition [4/5].

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

a year ago

The finding that important cross-attention heads implement one of a small set of interpretable functions helps boost VLMs' transparency and trust. Paper: export.arxiv.org/pdf/2406.16320 GitHub: github.com/wrudman/NOTICE… [5/5].

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

William Rudman

@williamrudmanjr

a year ago

How do VLMs like BLIP and LLaVA differ in how they process visual information? Using our mech-interp pipeline for VLMs, NOTICE, we first show important cross-attention heads in BLIP can perform image grounding, whereas important self-attention heads in LLaVA do not. [1/5]

thumb_up_off_alt3

chat_bubble_outline1

repeat2

shareShare

William Rudman

@williamrudmanjr

a year ago

Instead, LLaVA relies on self-attention heads to manage “outlier” attention patterns in the image, focusing on regulating these outliers. Interestingly, some of BLIP's attention heads are also dedicated to reducing attention to outlier features. [2/5]

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

a year ago

NOTICE uses Symmetric Token Replacement for text corruption and Semantic Image Pairs (SIP) for image corruption. SIP replaces clean images with ones differing in a single semantic property, such as object or emotion, enabling meaningful causal mediation analysis of VLMs. [3/5]

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

a year ago

We extend the generalizability of NOTICE by using Stable-Diffusion to generate semantic image pairs and find results are nearly identical to curated semantic image pairs. [4/5]

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

a year ago

The finding that important attention heads implement one of a small set of interpretable functions boosts transparency and trust in VLMs. Michal Golovanevsky Vedant Palit #nlp #mechinterp Paper: export.arxiv.org/pdf/2406.16320 GitHub: github.com/wrudman/NOTICE… [5/5]

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Computer Vision and Pattern Recognition Papers

@csvisionpapers

6 months ago

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts. arxiv.org/abs/2505.17127

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

William Rudman

@williamrudmanjr

6 months ago

When vision-language models answer questions, are they truly analyzing the image or relying on memorized facts? We introduce Pixels vs. Priors (PvP), a method to control whether VLMs respond based on input pixels or world knowledge priors. [1/5]

thumb_up_off_alt25

chat_bubble_outline1

repeat4

shareShare

William Rudman

@williamrudmanjr

6 months ago

We create Visual CounterFact: a dataset of realistic images that contrast pixel evidence against memorized knowledge. We edit visual attributes to create counterfactual images (a blue strawberry) that directly contradict typical associations (strawberries are red). [2/5]

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

6 months ago

Models rely on memorized priors early in their processing but shift toward visual evidence in mid-to-late layers. This shows a competition between visual input and stored knowledge, with pixels often overriding priors at the final prediction. [3/5]

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

William Rudman

@williamrudmanjr

6 months ago

With PvP, we can shift 92.5% of color predictions and 74.6% of size predictions from memorized priors to counterfactual answers. Code: github.com/rsinghlab/pixe… HuggingFace Dataset: mgolov/Visual-Counterfact [5/5]

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Michael Lepori

@michael_lepori

6 months ago

How do VLMs balance visual information presented in-context with linguistic priors encoded in-weights? In this project, Michal Golovanevsky and William Rudman find out! My favorite result: you can find a vector that shifts attention to image tokens and changes the VLM's response!

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare