Felix Dangel (@f_dangel) Twitter Tweets • TwiCopy

Felix Dangel

4 years ago

I'm excited to announce basic support for ResNets & RNNs in BackPACK 1.4 for PyTorch! 🎉 Find out more in the tutorials: 📈 docs.backpack.pt/en/1.4.0/use_c… 📈 docs.backpack.pt/en/1.4.0/use_c… Thanks to Tim Schäfer for his work on the library in the past months 🙏.

thumb_up_off_alt30

chat_bubble_outline0

repeat5

shareShare

Alexander Immer

@a1mmer

4 years ago

In our #NeurIPS2021 paper (arxiv.org/abs/2106.14806), we introduce laplace-torch for effortless Bayesian deep learning. Despite their simplicity, we find that Laplace approximations are surprisingly competitive with more popular approaches. youtu.be/nMONiYLWWOU

thumb_up_off_alt411

chat_bubble_outline7

repeat91

shareShare

Felix Dangel

@f_dangel

4 years ago

Which plane would you board? [#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training. Empowered by recent advances in autodiff. In collaboration with Frank Schneider & @PhilippHennig5.

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Agustinus Kristiadi

@akristiadi7

2 years ago

The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account 🧵 arxiv.org/abs/2302.07384 w/ Felix Dangel and @PhilippHennig5

thumb_up_off_alt100

chat_bubble_outline2

repeat18

shareShare

Wu Lin

@linyorker

2 years ago

For the first time, we (with Felix Dangel, Runa Eschenhagen, Kirill Neklyudov Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1

For the first time, we (with <a href="/f_dangel/">Felix Dangel</a>, <a href="/runame_/">Runa Eschenhagen</a>, <a href="/k_neklyudov/">Kirill Neklyudov</a> <a href="/akristiadi7/">Agustinus Kristiadi</a>, Richard E. Turner, <a href="/AliMakhzani/">Alireza Makhzani</a>) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1

thumb_up_off_alt47

chat_bubble_outline2

repeat8

shareShare

Wu Lin

@linyorker

a year ago

#ICML2024 Can We Remove the Square-Root in Adaptive Methods? arxiv.org/abs/2402.03496 Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1

thumb_up_off_alt60

chat_bubble_outline9

repeat16

shareShare

Weronika Ormaniec

@wormaniec

4 months ago

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With Sidak Pal Singh & Felix Dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8

thumb_up_off_alt25

chat_bubble_outline1

repeat8

shareShare

Felix Dangel

@f_dangel

18 days ago

KFAC is everywhere—from optimization to influence functions. While the intuition is simple, implementation is tricky. We (Bálint Mucsányi, Tobias Weber ,Runa Eschenhagen) wrote a ground-up intro with code to help you get it right. 📖 arxiv.org/abs/2507.05127 💻 github.com/f-dangel/kfac-…

thumb_up_off_alt36

chat_bubble_outline0

repeat9

shareShare