Michal Golovanevsky (@michalgolov) 's Twitter Profile
Michal Golovanevsky

@michalgolov

CS PhD student @BrownCSDept | Multimodal Learning | Mechanistic Interpretability | Clinical Deep Learning.

ID: 1573399875278049280

linkhttps://github.com/michalg04 calendar_today23-09-2022 19:52:33

22 Tweet

32 Takipçi

42 Takip Edilen

William Rudman (@williamrudmanjr) 's Twitter Profile Photo

By visualizing cross-attention patterns, we've discovered that these universal heads fall into three functional categories: implicit image segmentation, object inhibition, and outlier inhibition [4/5].

By visualizing cross-attention patterns, we've discovered that these universal heads fall into three functional categories: implicit image segmentation, object inhibition, and outlier inhibition  [4/5].
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

The finding that important cross-attention heads implement one of a small set of interpretable functions helps boost VLMs' transparency and trust. Paper: export.arxiv.org/pdf/2406.16320 GitHub: github.com/wrudman/NOTICE… [5/5].

William Rudman (@williamrudmanjr) 's Twitter Profile Photo

How do VLMs like BLIP and LLaVA differ in how they process visual information? Using our mech-interp pipeline for VLMs, NOTICE, we first show important cross-attention heads in BLIP can perform image grounding, whereas important self-attention heads in LLaVA do not. [1/5]

How do VLMs like BLIP and LLaVA differ in how they process visual information? Using our mech-interp pipeline for VLMs, NOTICE, we first show important cross-attention heads in BLIP can perform image grounding, whereas important self-attention heads in LLaVA do not. [1/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

Instead, LLaVA relies on self-attention heads to manage “outlier” attention patterns in the image, focusing on regulating these outliers. Interestingly, some of BLIP's attention heads are also dedicated to reducing attention to outlier features. [2/5]

Instead, LLaVA relies on self-attention heads to manage “outlier” attention patterns in the image, focusing on regulating these outliers. Interestingly, some of BLIP's attention heads are also dedicated to reducing attention to outlier features. [2/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

NOTICE uses Symmetric Token Replacement for text corruption and Semantic Image Pairs (SIP) for image corruption. SIP replaces clean images with ones differing in a single semantic property, such as object or emotion, enabling meaningful causal mediation analysis of VLMs. [3/5]

NOTICE uses Symmetric Token Replacement for text corruption and Semantic Image Pairs (SIP) for image corruption. SIP replaces clean images with ones differing in a single semantic property, such as object or emotion, enabling meaningful causal mediation analysis of VLMs.  [3/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

We extend the generalizability of NOTICE by using Stable-Diffusion to generate semantic image pairs and find results are nearly identical to curated semantic image pairs. [4/5]

We extend the generalizability of NOTICE by using Stable-Diffusion to generate semantic image pairs and find results are nearly identical to curated semantic image pairs. [4/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

The finding that important attention heads implement one of a small set of interpretable functions boosts transparency and trust in VLMs. Michal Golovanevsky Vedant Palit #nlp #mechinterp Paper: export.arxiv.org/pdf/2406.16320 GitHub: github.com/wrudman/NOTICE… [5/5]

William Rudman (@williamrudmanjr) 's Twitter Profile Photo

When vision-language models answer questions, are they truly analyzing the image or relying on memorized facts? We introduce Pixels vs. Priors (PvP), a method to control whether VLMs respond based on input pixels or world knowledge priors. [1/5]

When vision-language models answer questions, are they truly analyzing the image or relying on memorized facts? We introduce Pixels vs. Priors (PvP), a method to control whether VLMs respond based on input pixels or world knowledge priors. [1/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

We create Visual CounterFact: a dataset of realistic images that contrast pixel evidence against memorized knowledge. We edit visual attributes to create counterfactual images (a blue strawberry) that directly contradict typical associations (strawberries are red). [2/5]

William Rudman (@williamrudmanjr) 's Twitter Profile Photo

Models rely on memorized priors early in their processing but shift toward visual evidence in mid-to-late layers. This shows a competition between visual input and stored knowledge, with pixels often overriding priors at the final prediction. [3/5]

Models rely on memorized priors early in their processing but shift toward visual evidence in mid-to-late layers. This shows a competition between visual input and stored knowledge, with pixels often overriding priors at the final prediction. [3/5]
William Rudman (@williamrudmanjr) 's Twitter Profile Photo

With PvP, we can shift 92.5% of color predictions and 74.6% of size predictions from memorized priors to counterfactual answers. Code: github.com/rsinghlab/pixe… HuggingFace Dataset: mgolov/Visual-Counterfact [5/5]

Michael Lepori (@michael_lepori) 's Twitter Profile Photo

How do VLMs balance visual information presented in-context with linguistic priors encoded in-weights? In this project, Michal Golovanevsky and William Rudman find out! My favorite result: you can find a vector that shifts attention to image tokens and changes the VLM's response!