
Daking Rai
@dakingrai
CS PhD Student @GeorgeMasonU
ID: 2828986548
https://dakingrai.github.io/ 24-09-2014 00:53:57
29 Tweet
179 Followers
305 Following

[1/6] Mechanistic Interpretability (MI) is an emerging sub-field of interpretability that aims to understand LMs by reverse-engineering its underlying computation. Here we present a comprehensive survey curated specifically as a ๐ ๐ฎ๐ข๐๐ ๐๐จ๐ซ ๐ง๐๐ฐ๐๐จ๐ฆ๐๐ซ๐ฌ ๐ญ๐จ ๐ญ๐ก๐ข๐ฌ
![Daking Rai (@dakingrai) on Twitter photo [1/6] Mechanistic Interpretability (MI) is an emerging sub-field of interpretability that aims to understand LMs by reverse-engineering its underlying computation. Here we present a comprehensive survey curated specifically as a ๐ ๐ฎ๐ข๐๐ ๐๐จ๐ซ ๐ง๐๐ฐ๐๐จ๐ฆ๐๐ซ๐ฌ ๐ญ๐จ ๐ญ๐ก๐ข๐ฌ [1/6] Mechanistic Interpretability (MI) is an emerging sub-field of interpretability that aims to understand LMs by reverse-engineering its underlying computation. Here we present a comprehensive survey curated specifically as a ๐ ๐ฎ๐ข๐๐ ๐๐จ๐ซ ๐ง๐๐ฐ๐๐จ๐ฆ๐๐ซ๐ฌ ๐ญ๐จ ๐ญ๐ก๐ข๐ฌ](https://pbs.twimg.com/media/GR-pfUqWEAAZWgb.png)