Maurice Weiler (@maurice_weiler) 's Twitter Profile
Maurice Weiler

@maurice_weiler

AI researcher with a focus on geometric DL and equivariant CNNs. PhD with Max Welling. Master's degree in physics.

ID: 955017329863221248

linkhttps://maurice-weiler.gitlab.io/ calendar_today21-01-2018 10:00:49

760 Tweet

3,3K Followers

1,1K Following

Aditi Krishnapriyan (@ask1729) 's Twitter Profile Photo

Neural Spectral Methods (spectral-based neural operator + spectral loss training method) will be presented at Poster session 4 this Wed (May 8, 4:30 PM) at #ICLR2024! Paper: openreview.net/forum?id=2DbVe… Code: github.com/ASK-Berkeley/N…

Tycho van der Ouderaa (@tychovdo) 's Twitter Profile Photo

🌟New work: Noether's razor⭐️ Our NeurIPS 2024 paper connects ML symmetries to conserved quantities through a seminal result in mathematical physics: Noether's theorem. We can learn neural network symmetries from data by learning associated conservation laws. Learn more👇. 1/16🧵

🌟New work: Noether's razor⭐️ Our NeurIPS 2024 paper connects ML symmetries to conserved quantities through a seminal result in mathematical physics: Noether's theorem. We can learn neural network symmetries from data by learning associated conservation laws. Learn more👇. 1/16🧵
Sham Kakade (@shamkakade6) 's Twitter Profile Photo

1/5⚡Introducing Flash Inference: an *exact* method cutting inference time for Long Convolution Sequence Models (LCSMs) to near-linear O(L log² L) complexity! Faster inference, same precision. Learn how we accelerate LCSM inference.

1/5⚡Introducing Flash Inference: an *exact* method cutting inference time for Long Convolution Sequence Models (LCSMs) to near-linear O(L log² L) complexity! Faster inference, same precision. Learn how we accelerate LCSM inference.
Jeremy Bernstein (@jxbz) 's Twitter Profile Photo

Over the past month, methods developed by myself and my collaborators were used to set new speed records for training LLMs up to 1.5B scale. I also want to help the science go faster, so now get ready for: ~The General Theory of Modular Duality~ arxiv.org/abs/2410.21265 (1/9)

Johann Brehmer (@johannbrehmer) 's Twitter Profile Photo

Does equivariance matter when you have lots of data and compute? In a new paper with Sönke Behrends, Pim de Haan, and Taco Cohen, we collect some evidence. arxiv.org/abs/2410.23179 1/7

Maurice Weiler (@maurice_weiler) 's Twitter Profile Photo

Very cool paper on "Generator Matching", a method to parametrize+train Markov processes via their infinitesimal generators. Besides flows and diffusion, it includes jumps over finite distances and allows to jointly generate continuous and discrete random variables🦾

Aditi Krishnapriyan (@ask1729) 's Twitter Profile Photo

1/ What are key design principles for scaling neural network interatomic potentials? Our exploration leads us to top results on the Open Catalyst Project (OC20, OC22), SPICE, and MPTrj, with vastly improved efficiency! Accepted at #NeurIPS2024: arxiv.org/abs/2410.24169

1/ What are key design principles for scaling neural network interatomic potentials? Our exploration leads us to top results on the Open Catalyst Project (OC20, OC22), SPICE, and MPTrj, with vastly improved efficiency!

Accepted at #NeurIPS2024: arxiv.org/abs/2410.24169
Ruben Ohana (@oharub) 's Twitter Profile Photo

Generating cat videos is nice, but what if you could tackle real scientific problems with the same methods? 🧪🌌 Introducing The Well: 16 datasets (15TB) for Machine Learning, from astrophysics to fluid dynamics and biology. 🐙: github.com/PolymathicAI/t… 📜: openreview.net/pdf?id=00Sx577…

Zhou Xian (@zhou_xian_) 's Twitter Profile Photo

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics

Peter Holderrieth (@peholderrieth) 's Twitter Profile Photo

New paper out! We introduce “LEAPS”, a neural sampling algorithm for discrete distributions via continuous-time Markov chains (“discrete diffusion”). We introduce a novel importance sampling scheme and novel symmetries built into neural networks. arxiv.org/pdf/2502.10843 (1/4)

New paper out!

We introduce “LEAPS”, a neural sampling algorithm for discrete distributions via continuous-time Markov chains (“discrete diffusion”). We introduce a novel importance sampling scheme and novel symmetries built into neural networks.

arxiv.org/pdf/2502.10843

(1/4)
Shubhendu Trivedi (@_onionesque) 's Twitter Profile Photo

This work is quite natural, even though a lot of it looks like this (it will take a while to work out the details). What they do is to use general integral operators to write the non-linear maps. These then correspond to specific characterizations of the space of kernels.

This work is quite natural, even though a lot of it looks like this (it will take a while to work out the details). What they do is to use general integral operators to write the non-linear maps. These then correspond to specific characterizations of the space of kernels.
Nabil Iqbal (@nblqbl) 's Twitter Profile Photo

The arxiv preprint on our conformally equivariant neural network -- named AdS-GNN due to its secret origins in AdS/CFT -- is now out! arxiv.org/abs/2505.12880 🧵explaining it below. Joint work with the amazing team of Max Zhdanov, Erik Bekkers and Patrick Forre.

The arxiv preprint on our conformally equivariant neural network -- named AdS-GNN due to its secret origins in AdS/CFT -- is now out!

arxiv.org/abs/2505.12880 

🧵explaining it below. Joint work with the amazing team of <a href="/maxxxzdn/">Max Zhdanov</a>, <a href="/erikjbekkers/">Erik Bekkers</a> and Patrick Forre.
Maurice Weiler (@maurice_weiler) 's Twitter Profile Photo

New preprint! We extend Taco Cohen's theory of equivariant CNNs on homogeneous spaces to the non-linear setting. Beyond convolutions, this covers equivariant attention, implicit kernel MLPs and more general message passing layers. More details in Oscar Carlsson's thread 👇

New preprint! We extend  <a href="/TacoCohen/">Taco Cohen</a>'s theory of equivariant CNNs on homogeneous spaces to the non-linear setting. Beyond convolutions, this covers equivariant attention, implicit kernel MLPs and more general message passing layers.
More details in <a href="/O_EA_Carlsson/">Oscar Carlsson</a>'s thread 👇
Katie Everett (@_katieeverett) 's Twitter Profile Photo

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

Ekdeep Singh Lubana (@ekdeepl) 's Twitter Profile Photo

🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11

Alejandro García (@algarciacast) 's Twitter Profile Photo

🌍 From earthquake prediction to robot navigation - what connects them? Eikonal equations! We developed E-NES: a neural network that leverages geometric symmetries to solve entire families of velocity fields through group transformations. Grid-free and scalable! 🧵👇

Alex Vacca (@itsalexvacca) 's Twitter Profile Photo

BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn't making us more productive. It's making us cognitively bankrupt. Here's what 4 months of data revealed: (hint: we've been measuring productivity all wrong)

BREAKING: MIT just completed the first brain scan study of ChatGPT users &amp; the results are terrifying.

Turns out, AI isn't making us more productive. It's making us cognitively bankrupt.

Here's what 4 months of data revealed:

(hint: we've been measuring productivity all wrong)
Kirill Neklyudov (@k_neklyudov) 's Twitter Profile Photo

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight GenBio Workshop @ ICML25 of ICML Conference 2025! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight <a href="/genbio_workshop/">GenBio Workshop @ ICML25</a> of <a href="/icmlconf/">ICML Conference</a> 2025!

PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data