Fazl Barez (@fazlbarez) 's Twitter Profile
Fazl Barez

@fazlbarez

Making AI safe one Google doc at a time| Let's build AI's we can trust!

ID: 1341019917005537280

linkhttps://fbarez.github.io calendar_today21-12-2020 13:57:26

464 Tweet

1,1K Followers

729 Following

Fazl Barez (@fazlbarez) 's Twitter Profile Photo

New paper alert! 🚨 Important question: Do SAEs generalise? We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes. Answer: probes outperform SAE features in-domain, out-of-domain generalization varies sharply between

New paper alert! 🚨

Important question: Do SAEs generalise? 
We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes. 

Answer: 
probes outperform SAE features in-domain, out-of-domain generalization varies sharply between