Praneet (@praneet_suresh_) 's Twitter Profile
Praneet

@praneet_suresh_

ML PhD @Mila_Quebec

ID: 3780161412

linkhttps://praneetneuro.github.io calendar_today04-10-2015 10:47:53

72 Tweet

82 Followers

230 Following

Dillan DiNardo (@dillandinardo) 's Twitter Profile Photo

1/ We're thrilled to announce we've received both FDA & EMA approval to begin human trials in the development of our neurotech platform for designing altered states. It’s time to move beyond “psychedelics” and begin the precision design of mood, cognition, and perception. (1/9)

Dillan DiNardo (@dillandinardo) 's Twitter Profile Photo

9/ We don’t need to eliminate the psychoactive effects of psychedelics. We need the ability to choose the effects that are helpful – to transform these largely effective but unpredictable messes of psychoactive effects into a series of precision-targeted psychoactive profiles. We

Sonia (@soniajoseph_) 's Twitter Profile Photo

I wrote a post on multimodal interpretability techniques, including sparse feature circuit discovery, exploiting the shared text-image space of CLIP, and training adapters. soniajoseph.ai/multimodal-int… Having spent part of the summer in the AI safety sphere in Berkeley, and then

Sonia (@soniajoseph_) 's Twitter Profile Photo

Disclaimer: I am not writing this message in connection to my employer, my institution, or any third party. This is a personal judgment call, exercised solely in my own capacity. Over the past few months, I’ve been supporting the victim of a crime perpetrated by an AGI frontier

Sonia (@soniajoseph_) 's Twitter Profile Photo

I wrote a short post after recent convos with researchers misinterpreting the logit lens. The logit lens can be deceptive in only showing what aligns with output space. Linear probes reveal meaningful representations emerge in much earlier layers. ViT accuracy on ImageNet1k:

I wrote a short post after recent convos with researchers misinterpreting the logit lens.

The logit lens can be deceptive in only showing what aligns with output space. Linear probes reveal meaningful representations emerge in much earlier layers.

ViT accuracy on ImageNet1k:
Jack Stanley (@jackhtstanley) 's Twitter Profile Photo

Sharing our latest paper, now published in Cell. We fine-tuned an LLM to accurately predict the autism diagnosis from over 4200 multi-page clinical text reports.

Sharing our latest paper, now published in Cell. 

We fine-tuned an LLM to accurately predict the autism diagnosis from over 4200 multi-page clinical text reports.
Samuel Marks (@saprmarks) 's Twitter Profile Photo

Neel Nanda FWIW I disagree that sparse probing experiments test the "representing concepts crisply" and "identify a complete decomposition" claims about SAEs. In other words, I expect that—even if SAEs perfectly decomposed LLM activations into human-understandable latents with nothing

Sonia (@soniajoseph_) 's Twitter Profile Photo

We visualized the features of 16 SAEs trained on CLIP in collaboration between Fraunhofer HHI and Mila - Institut québécois d'IA! Search thousands of interpretable CLIP features in our vision atlas, with autointerp labels, & scores like clarity and polysemanticity. Some fun features in thread:

Sonia (@soniajoseph_) 's Twitter Profile Photo

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉

We’ll be in Nashville next week. Come say hi 👋

<a href="/CVPR/">#CVPR2025</a>  <a href="/miv_cvpr2025/">Mechanistic Interpretability for Vision @ CVPR2025</a>
Sonia (@soniajoseph_) 's Twitter Profile Photo

We are releasing: - 80+ SAEs covering every layer of CLIP and DINO, plus CLIP transcoders - Transformer-Lens-style circuit code for 100+ models, including CLIP, DINO, and video transformers from Hugging Face & OpenCLIP - Interactive notebooks for training and evaluating sparse

We are releasing:
- 80+ SAEs covering every layer of CLIP and DINO, plus CLIP transcoders
- Transformer-Lens-style circuit code for 100+ models, including CLIP, DINO, and video transformers from <a href="/huggingface/">Hugging Face</a> &amp; OpenCLIP
- Interactive notebooks for training and evaluating sparse
Sonia (@soniajoseph_) 's Twitter Profile Photo

Some of the Prisma SAE features get odd in the best way: "twin objects," "outlier/chosen object," and Oktoberfest-related concepts. We visualize the features here, in collaboration with Fraunhofer HHI: semanticlens.hhi-research-insights.de/umap-view

Sonia (@soniajoseph_) 's Twitter Profile Photo

Some interesting observations: Vision SAEs differ from language-- e.g. token-specific specialization. The CLS token sparsity gradually decreases as global info accumulates. We plot % of alive SAE features by layer, showing how the network constructs a global representation.

Some interesting observations:

Vision SAEs differ from language-- e.g. token-specific specialization.

The CLS token sparsity gradually decreases as global info accumulates. We plot % of alive SAE features by layer, showing how the network constructs a global representation.
Sonia (@soniajoseph_) 's Twitter Profile Photo

Surprisingly, SAEs can denoise vision tokens and reduce loss in vision. This is rare in language, but vision is a more redundant modality, suggesting that SAEs may help not just interpret but also _improve_ representations.

Surprisingly, SAEs can denoise vision tokens and reduce loss in vision.

This is rare in language, but vision is a more redundant modality, suggesting that SAEs may help not just interpret but also _improve_ representations.
Sonia (@soniajoseph_) 's Twitter Profile Photo

While SAEs declined in language, we find their use in vision promising. We're especially interested in rigorously understanding the first principles and limitations of sparse coders, with Prisma as one step toward doing that in the open.

Sonia (@soniajoseph_) 's Twitter Profile Photo

Prisma is our open-source toolkit for mechanistic interpretability in vision and video-- built to make circuits, SAEs, and transcoders easier to study. Whitepaper: arxiv.org/abs/2504.19475 Repo: github.com/Prisma-Multimo… Let us know what you build with it!

Sonia (@soniajoseph_) 's Twitter Profile Photo

as a weekend project, I made a video overview of vision sparse autoencoders, covering their history, recent negative results, future directions, and a demo of running an image of a parrot through an SAE to explore its features. (link below)

as a weekend project, I made a video overview of vision sparse autoencoders, covering their history, recent negative results, future directions, and a demo of running an image of a parrot through an SAE to explore its features.

(link below)
Sonia (@soniajoseph_) 's Twitter Profile Photo

SAEs can have a "LoRa-like" effect. ImageNet accuracy improves if you train a CLIP SAE only on ImageNet, and then filter the embeddings through the SAE.

SAEs can have a "LoRa-like" effect. 

ImageNet accuracy improves if you train a CLIP SAE only on ImageNet, and then filter the embeddings through the SAE.