Praneet (@praneet_suresh_) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

1/ We're thrilled to announce we've received both FDA & EMA approval to begin human trials in the development of our neurotech platform for designing altered states. It’s time to move beyond “psychedelics” and begin the precision design of mood, cognition, and perception. (1/9)

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat169

shareShare

Dillan DiNardo

@dillandinardo

a year ago

9/ We don’t need to eliminate the psychoactive effects of psychedelics. We need the ability to choose the effects that are helpful – to transform these largely effective but unpredictable messes of psychoactive effects into a series of precision-targeted psychoactive profiles. We

thumb_up_off_alt165

chat_bubble_outline20

repeat4

shareShare

Praneet

@praneet_suresh_

a year ago

I can confirm that training SAEs are very addicting.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Sonia

@soniajoseph_

10 months ago

I wrote a post on multimodal interpretability techniques, including sparse feature circuit discovery, exploiting the shared text-image space of CLIP, and training adapters. soniajoseph.ai/multimodal-int… Having spent part of the summer in the AI safety sphere in Berkeley, and then

thumb_up_off_alt221

chat_bubble_outline2

repeat31

shareShare

Sonia

@soniajoseph_

8 months ago

Disclaimer: I am not writing this message in connection to my employer, my institution, or any third party. This is a personal judgment call, exercised solely in my own capacity. Over the past few months, I’ve been supporting the victim of a crime perpetrated by an AGI frontier

thumb_up_off_alt252

chat_bubble_outline12

repeat20

shareShare

Sonia

@soniajoseph_

5 months ago

I wrote a short post after recent convos with researchers misinterpreting the logit lens. The logit lens can be deceptive in only showing what aligns with output space. Linear probes reveal meaningful representations emerge in much earlier layers. ViT accuracy on ImageNet1k:

thumb_up_off_alt45

chat_bubble_outline3

repeat2

shareShare

Jack Stanley

@jackhtstanley

4 months ago

Sharing our latest paper, now published in Cell. We fine-tuned an LLM to accurately predict the autism diagnosis from over 4200 multi-page clinical text reports.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Samuel Marks

@saprmarks

4 months ago

Neel Nanda FWIW I disagree that sparse probing experiments test the "representing concepts crisply" and "identify a complete decomposition" claims about SAEs. In other words, I expect that—even if SAEs perfectly decomposed LLM activations into human-understandable latents with nothing

thumb_up_off_alt25

chat_bubble_outline4

repeat1

shareShare

Sonia

@soniajoseph_

4 months ago

We visualized the features of 16 SAEs trained on CLIP in collaboration between Fraunhofer HHI and Mila - Institut québécois d'IA! Search thousands of interpretable CLIP features in our vision atlas, with autointerp labels, & scores like clarity and polysemanticity. Some fun features in thread:

thumb_up_off_alt581

chat_bubble_outline9

repeat26

shareShare

Sonia

@soniajoseph_

2 months ago

Our paper Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video received an Oral at the Mechanistic Interpretability for Vision Workshop at CVPR 2025! 🎉 We’ll be in Nashville next week. Come say hi 👋 #CVPR2025 Mechanistic Interpretability for Vision @ CVPR2025

thumb_up_off_alt288

chat_bubble_outline3

repeat31

shareShare

Sonia

@soniajoseph_

2 months ago

We are releasing: - 80+ SAEs covering every layer of CLIP and DINO, plus CLIP transcoders - Transformer-Lens-style circuit code for 100+ models, including CLIP, DINO, and video transformers from Hugging Face & OpenCLIP - Interactive notebooks for training and evaluating sparse

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Sonia

@soniajoseph_

2 months ago

Some of the Prisma SAE features get odd in the best way: "twin objects," "outlier/chosen object," and Oktoberfest-related concepts. We visualize the features here, in collaboration with Fraunhofer HHI: semanticlens.hhi-research-insights.de/umap-view

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Sonia

@soniajoseph_

2 months ago

Some interesting observations: Vision SAEs differ from language-- e.g. token-specific specialization. The CLS token sparsity gradually decreases as global info accumulates. We plot % of alive SAE features by layer, showing how the network constructs a global representation.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Sonia

@soniajoseph_

2 months ago

Surprisingly, SAEs can denoise vision tokens and reduce loss in vision. This is rare in language, but vision is a more redundant modality, suggesting that SAEs may help not just interpret but also _improve_ representations.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Sonia

@soniajoseph_

2 months ago

While SAEs declined in language, we find their use in vision promising. We're especially interested in rigorously understanding the first principles and limitations of sparse coders, with Prisma as one step toward doing that in the open.

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Sonia

@soniajoseph_

2 months ago

Prisma is our open-source toolkit for mechanistic interpretability in vision and video-- built to make circuits, SAEs, and transcoders easier to study. Whitepaper: arxiv.org/abs/2504.19475 Repo: github.com/Prisma-Multimo… Let us know what you build with it!

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Sonia

@soniajoseph_

2 months ago

as a weekend project, I made a video overview of vision sparse autoencoders, covering their history, recent negative results, future directions, and a demo of running an image of a parrot through an SAE to explore its features. (link below)

thumb_up_off_alt301

chat_bubble_outline4

repeat14

shareShare

Sonia

@soniajoseph_

a month ago

SAEs can have a "LoRa-like" effect. ImageNet accuracy improves if you train a CLIP SAE only on ImageNet, and then filter the embeddings through the SAE.

thumb_up_off_alt103

chat_bubble_outline3

repeat5

shareShare