tensorqt (@tensorqt) 's Twitter Profile
tensorqt

@tensorqt

chaos dancing star

ID: 1496062602253942784

calendar_today22-02-2022 10:01:49

4,4K Tweet

1,1K Followers

300 Following

tensorqt (@tensorqt) 's Twitter Profile Photo

doesn't have to be about the past. the further we are from an external, properly received feedback, the more delusional our response will be

leloy! (@leloykun) 's Twitter Profile Photo

I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, Jeremy Bernstein, and jianlin.su on Muon, Orthogonal Muon, & Stiefel Muon. --- The general solution turned out to be much simpler than I thought. And it should

I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, <a href="/jxbz/">Jeremy Bernstein</a>, and <a href="/Jianlin_S/">jianlin.su</a> on Muon, Orthogonal Muon, &amp; Stiefel Muon.

---

The general solution turned out to be much simpler than I thought. And it should
tensorqt (@tensorqt) 's Twitter Profile Photo

i think more hyperparameters are, in general, a good thing: each hyperparameter is (obviously) parametrizing a class of instances of your systems, possibly along tradeoffs you can exploit. The problem is when the hp exploration becomes too costly to even attempt, and standard

Alexander Doria (@dorialexander) 's Twitter Profile Photo

Blogpost to read today: strong argument that the oversquashing of attention in the first tokens is not something learned from data distribution (like model should naturally "care" about the start of the text to grasp the rest) but a fundamental feature of attention graph.

Lev Telyatnikov (@lev_telyatnikov) 's Twitter Profile Photo

Thrilled to announce TopoBench has been accepted to DMLR! 🚀 It’s a modular library for Topological Deep Learning, built to provide reproducible, cross-domain benchmarks and accelerate research. GitHub: github.com/geometric-inte… #AI #DeepLearning

Petar Veličković (@petarv_93) 's Twitter Profile Photo

tensorqt that is cool!! i like your perspective on this (it differs from my own slightly) and will likely use it in my future talks :) "In the strictly lower-triangular case (no self-loops) this is a nilpotent operator, so sufficiently high powers collapse entirely into the earliest