Felix Dangel (@f_dangel) 's Twitter Profile
Felix Dangel

@f_dangel

Postdoc at the Vector Institute

ID: 1432066673432006662

calendar_today29-08-2021 19:45:03

8 Tweet

148 Takipçi

70 Takip Edilen

Felix Dangel (@f_dangel) 's Twitter Profile Photo

I'm excited to announce basic support for ResNets & RNNs in BackPACK 1.4 for PyTorch! 🎉 Find out more in the tutorials: 📈 docs.backpack.pt/en/1.4.0/use_c… 📈 docs.backpack.pt/en/1.4.0/use_c… Thanks to Tim Schäfer for his work on the library in the past months 🙏.

Alexander Immer (@a1mmer) 's Twitter Profile Photo

In our #NeurIPS2021 paper (arxiv.org/abs/2106.14806), we introduce laplace-torch for effortless Bayesian deep learning. Despite their simplicity, we find that Laplace approximations are surprisingly competitive with more popular approaches. youtu.be/nMONiYLWWOU

Felix Dangel (@f_dangel) 's Twitter Profile Photo

Which plane would you board? [#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training. Empowered by recent advances in autodiff. In collaboration with Frank Schneider & @PhilippHennig5.

Which plane would you board?

[#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training.
Empowered by recent advances in autodiff.

In collaboration with <a href="/frankstefansch1/">Frank Schneider</a> &amp; @PhilippHennig5.
Agustinus Kristiadi (@akristiadi7) 's Twitter Profile Photo

The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account 🧵 arxiv.org/abs/2302.07384 w/ Felix Dangel and @PhilippHennig5

The consensus in deep learning is that many quantities are not invariant under reparametrization. Our #NeurIPS2023 paper shows that they actually are if the implicitly assumed Riemannian metric is taken into account  🧵

arxiv.org/abs/2302.07384

w/ <a href="/f_dangel/">Felix Dangel</a> and @PhilippHennig5
Wu Lin (@linyorker) 's Twitter Profile Photo

For the first time, we (with Felix Dangel, Runa Eschenhagen, Kirill Neklyudov Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW. also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1

For the first time, we (with <a href="/f_dangel/">Felix Dangel</a>, <a href="/runame_/">Runa Eschenhagen</a>, <a href="/k_neklyudov/">Kirill Neklyudov</a> <a href="/akristiadi7/">Agustinus Kristiadi</a>, Richard E. Turner, <a href="/AliMakhzani/">Alireza Makhzani</a>) propose a sparse 2nd-order method for large NN training with BFloat16 and show its advantages over AdamW.  also @NeurIPS workshop on Opt for ML arxiv.org/abs/2312.05705 /1
Wu Lin (@linyorker) 's Twitter Profile Photo

#ICML2024 Can We Remove the Square-Root in Adaptive Methods? arxiv.org/abs/2402.03496 Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1

#ICML2024
Can We Remove the Square-Root in Adaptive Methods?
arxiv.org/abs/2402.03496
Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW)

Removing the root makes matrix methods faster:  Root-free Shampoo in BFloat16  /1
Weronika Ormaniec (@wormaniec) 's Twitter Profile Photo

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With Sidak Pal Singh & Felix Dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique?

With <a href="/unregularized/">Sidak Pal Singh</a> &amp; <a href="/f_dangel/">Felix Dangel</a>, we explore this via the Hessian in our #ICLR2025 spotlight paper!

Key insights👇 1/8
Felix Dangel (@f_dangel) 's Twitter Profile Photo

KFAC is everywhere—from optimization to influence functions. While the intuition is simple, implementation is tricky. We (Bálint Mucsányi, Tobias Weber ,Runa Eschenhagen) wrote a ground-up intro with code to help you get it right. 📖 arxiv.org/abs/2507.05127 💻 github.com/f-dangel/kfac-…