@fazlbarez : New paper🚨 Enhancing Interpretability with Feature-Aligned Sparse Autoencoders SAEs help us understand NNs by learning sparse representations of features, but they can learn features not in the neural network they were trained on Mutual Feature Regularization mitigates this! • TwiCopy

Fazl Barez

@fazlbarez

+ Follow

Making AI safe one Google doc at a time| Let's build AI's we can trust!

ID: 1341019917005537280

linkhttps://fbarez.github.io calendar_today21-12-2020 13:57:26

464 Tweet

1,1K Takipçi

729 Takip Edilen

Fazl Barez

@fazlbarez

9 months ago

New paper🚨 Enhancing Interpretability with Feature-Aligned Sparse Autoencoders SAEs help us understand NNs by learning sparse representations of features, but they can learn features not in the neural network they were trained on Mutual Feature Regularization mitigates this!

thumb_up_off_alt85

chat_bubble_outline2

repeat19

shareShare