Zach Furman (@furmanzach) 's Twitter Profile
Zach Furman

@furmanzach

Singular learning theory and AI alignment research. Previously embedded SWE, aerospace, and physics.

ID: 1591503649729036288

linkhttp://zachfurman.com calendar_today12-11-2022 18:50:39

12 Tweet

111 Followers

169 Following

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Eliciting Latent Predictions from Transformers with the Tuned Lens Analyzes transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. repo: github.com/AlignmentResea… abs: arxiv.org/abs/2303.08112

Eliciting Latent Predictions from Transformers with the Tuned Lens

Analyzes transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

repo: github.com/AlignmentResea…
abs: arxiv.org/abs/2303.08112
Nora Belrose (@norabelrose) 's Twitter Profile Photo

Ever wonder how a language model decides what to say next? Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵

Ever wonder how a language model decides what to say next?

Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵
Daniel Murfet (@danielmurfet) 's Twitter Profile Photo

Timaeus is a new research organization, dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. Led by Jesse Hoogland Consistently Candid Alex Stan van Wingerden and myself. lesswrong.com/posts/nN7bHuHZ… [1/n]

Timaeus is a new research organization, dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. Led by <a href="/jesse_hoogland/">Jesse Hoogland</a> <a href="/FellowHominid/">Consistently Candid Alex</a> Stan van Wingerden and myself. lesswrong.com/posts/nN7bHuHZ… [1/n]
Jesse Hoogland (@jesse_hoogland) 's Twitter Profile Photo

1/8 How do transformers learn? In our new work, we find that transformers develop in-context learning in discrete stages that can be automatically discovered. 🧵 arxiv.org/abs/2402.02364 Joint work w/ george, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei, Daniel Murfet

1/8 How do transformers learn? In our new work, we find that transformers develop in-context learning in discrete stages that can be automatically discovered. 🧵

arxiv.org/abs/2402.02364

Joint work w/ <a href="/georgeyw_/">george</a>, Matthew Farrugia-Roberts, <a href="/lemmykc/">Liam Carroll</a>, Susan Wei, <a href="/danielmurfet/">Daniel Murfet</a>
george (@georgeyw_) 's Twitter Profile Photo

1/ How do attention heads form? With our new approach, we show that attention heads have distinct developmental signatures. These signatures reveal how heads develop distinct functional roles specialized to different subsets of data. In the process, we discover a new circuit.

1/ How do attention heads form?

With our new approach, we show that attention heads have distinct developmental signatures. These signatures reveal how heads develop distinct functional roles specialized to different subsets of data. In the process, we discover a new circuit.