Fazl Barez (@fazlbarez) 's Twitter Profile
Fazl Barez

@fazlbarez

Making AI safe one Google doc at a time| Let's build AI's we can trust!

ID: 1341019917005537280

linkhttps://fbarez.github.io calendar_today21-12-2020 13:57:26

464 Tweet

1,1K Takipçi

729 Takip Edilen

Fazl Barez (@fazlbarez) 's Twitter Profile Photo

New paper🚨 Enhancing Interpretability with Feature-Aligned Sparse Autoencoders SAEs help us understand NNs by learning sparse representations of features, but they can learn features not in the neural network they were trained on Mutual Feature Regularization mitigates this!

New paper🚨

Enhancing Interpretability with Feature-Aligned Sparse Autoencoders

SAEs help us understand NNs by learning sparse representations of features, but they can learn features not in the neural network they were trained on

Mutual Feature Regularization mitigates this!