Alex Damian (@alex_damian_) 's Twitter Profile
Alex Damian

@alex_damian_

ID: 1374506243612536841

calendar_today23-03-2021 23:40:10

13 Tweet

290 Takipçi

84 Takip Edilen

Sepp Hochreiter (@hochreitersepp) 's Twitter Profile Photo

ArXiv arxiv.org/abs/2209.15594: Analysis of SGD. Sharpness (largest eigenvalue of the Hessian) steadily increases during training until instability cutoff 2/η then it hovers around 2/η. Training loss still decreases. Reason: self-stabilization via cubic term in Taylor expansion.

Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

New paper with Alex Damian and Jason Lee! We identify a new implicit bias of GD: Self-Stabilization. When the loss is too sharp and iterates begin to diverge, self-stabilization decreases sharpness until GD is stable. This explains the “Edge of Stability” phenomenon! (1/3)

Zhiyuan Li (@zhiyuanli_) 's Twitter Profile Photo

🚨💡We are organizing a workshop on Mathematics of Modern Machine Learning (M3L) at #NeurIPS2023! 🚀Join us if you are interested in exploring theories for understanding and advancing modern ML practice. sites.google.com/view/m3l-2023 Submission ddl: October 2, 2023 M3L Workshop @ NeurIPS 2024

🚨💡We are organizing a workshop on Mathematics of Modern Machine Learning (M3L) at #NeurIPS2023!

🚀Join us if you are interested in exploring theories for understanding and advancing modern ML practice. sites.google.com/view/m3l-2023

Submission ddl: October 2, 2023
<a href="/M3LWorkshop/">M3L Workshop @ NeurIPS 2024</a>
M3L Workshop @ NeurIPS 2024 (@m3lworkshop) 's Twitter Profile Photo

Hope everyone had a great time at M3L today! Many thanks to the speakers, authors, reviewers, participants and volunteers for all your contributions that made this workshop fun and successful, we hope to see you again next year! 😃✨

Hope everyone had a great time at M3L today! Many thanks to the speakers, authors, reviewers, participants and volunteers for all your contributions that made this workshop fun and successful, we hope to see you again next year! 😃✨
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] How Transformers Learn Causal Structure with Gradient Descent E Nichani, A Damian, J D. Lee [Princeton University] (2024) arxiv.org/abs/2402.14735 - The paper studies how transformers learn causal structure through gradient descent when trained on a novel in-context learning

[LG] How Transformers Learn Causal Structure with Gradient Descent
E Nichani, A Damian, J D. Lee [Princeton University] (2024)
arxiv.org/abs/2402.14735

- The paper studies how transformers learn causal structure through gradient descent when trained on a novel in-context learning
Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

Causal self-attention encodes causal structure between tokens (eg. induction head, learning function class in-context, n-grams). But how do transformers learn this causal structure via gradient descent? New paper with Alex Damian Jason Lee! arxiv.org/abs/2402.14735 (1/10)

Causal self-attention encodes causal structure between tokens (eg. induction head, learning function class in-context, n-grams). But how do transformers learn this causal structure via gradient descent?

New paper with <a href="/alex_damian_/">Alex Damian</a> <a href="/jasondeanlee/">Jason Lee</a>!

arxiv.org/abs/2402.14735

(1/10)