Chaitanya K. Joshi @ICLR2025 🇸🇬 (@chaitjo) Twitter Tweets • TwiCopy

Chaitanya K. Joshi @ICLR2025 🇸🇬

6 months ago

After a long hiatus, I've started blogging again! My first post was a difficult one to write, because I don't want to keep repeating what's already in papers. I tried to give some nuanced and (hopefully) fresh takes on equivariance and geometry in molecular modelling.

thumb_up_off_alt145

chat_bubble_outline5

repeat15

shareShare

Michael Bronstein @ICLR2025 🇸🇬

@mmbronstein

6 months ago

Chaitanya K. Joshi Xiang Fu Brandon Wood Nathan C. Frey Jason Yim Hannes Stärk Ilyes Batatia Ahmed Elhag Taco Cohen I agree in general. There is also a second coming of equivariance in weight spaces—a new field of neural networks that operate on neural networks

thumb_up_off_alt15

chat_bubble_outline2

repeat2

shareShare

Soledad Villar

@soledadvillar5

6 months ago

Chaitanya K. Joshi It's possible that data aug + simpler model outperforms SE(3) or E(3) equivariance in practice. The effective sample complexity gain is a constant independent of the input size. But the learning a permutation symmetry from data will never perform better IMO (unless data is tiny)

thumb_up_off_alt16

chat_bubble_outline2

repeat1

shareShare

Petar Veličković

@petarv_93

6 months ago

A great post as always from Chaitanya, pls go and read it 👇 It also made me reflect on a point... we've (as a field) done the equivariant approach a disservice by calling a model 'equivariant' only if it has 3D spatial symmetries -- and this is a (relatively) simple action

thumb_up_off_alt84

chat_bubble_outline5

repeat9

shareShare

Michael Bronstein @ICLR2025 🇸🇬

@mmbronstein

6 months ago

Petar Veličković Transformers are also equivariant. In general, small/low-dimensional groups such as SE(3) or T(2) are easy to learn. Large groups like Sn are hopeless

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Thomas Kipf

@tkipf

6 months ago

Soledad Villar Chaitanya K. Joshi Whether or how learning a permutation symmetry through data augmentation can be beneficial depends heavily on the setup. Some positive examples are set prediction tasks, for which the permutation symmetric solution (set prediction architecture + permutation symmetric loss) often

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Thomas Kipf

@tkipf

6 months ago

Petar Veličković Michael Bronstein Pix2Seq (set prediction tasks randomly serialized into sequence predicted using autoregressive model) vs DETR (set prediction architecture and loss) comparison is a good case study for why even baking in permutation symmetry into architecture and loss is not necessarily what

thumb_up_off_alt10

chat_bubble_outline3

repeat1

shareShare

Max Zhdanov

@maxxxzdn

6 months ago

Thomas Kipf Petar Veličković Michael Bronstein Recent works in point cloud processing do achieve state-of-the-art performance while breaking permutation symmetry to impose regular structure on unordered sets for better scaling (e.g. PointTransformer v3 arxiv.org/abs/2312.10035 or Erwin arxiv.org/abs/2502.17019).

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Ilyes Batatia

@ilyesbatatia

6 months ago

Chaitanya K. Joshi I agree that strict equivariance isn’t always necessary e.g. in generative diffusion models but it’s essential in others, like ML potentials. The claim that equivariant models are slow doesn’t hold in our field, top models like MACE (equi) and Orb (non-equi) have similar speeds.

thumb_up_off_alt15

chat_bubble_outline3

repeat1

shareShare

Ilyes Batatia

@ilyesbatatia

6 months ago

Rubén Ballester Petar Veličković Michael Bronstein I often see the claim that “SO(3) is easy to learn because it’s low-dimensional,” but to my knowledge, there’s no empirical or theoretical evidence supporting this. In fact, SO(3) is significantly harder to sample than SO(2), despite being only two dimensions higher.

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Hannes Stärk

@hannesstaerk

6 months ago

Chaitanya K. Joshi Xiang Fu Brandon Wood Nathan C. Frey Michael Bronstein Jason Yim Ilyes Batatia Ahmed Elhag Taco Cohen AF3/Boltz has little uncertainty left after the SE3 invariant trunk. A fun connected visualization is the x0 prediction trajectory. They always look like in this video where the initial x0 prediction is almost the same as at the end of the denoising trajectory.

thumb_up_off_alt3

chat_bubble_outline2

repeat1

shareShare

Ilyes Batatia

@ilyesbatatia

6 months ago

Michael Bronstein Rubén Ballester Petar Veličković I’d argue that non-equivariant models learn SO(3) well enough for generative modelling, where equivariance accuracy isn’t critical. But there’s evidence that it is hard to reach high equivariance accuracy for force fields. To me "easy to learn" would mean to any given accuracy.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Ilyes Batatia

@ilyesbatatia

6 months ago

Mark Neumann Chaitanya K. Joshi I agree standardised tests are essential, and timing things can be surprisingly hard. I think we can agree that orb-v3-conservative-inf and MACE are in a similar speed range, meaning they can do a few nanoseconds/day for a few thousand atoms. I can quote directly from your paper.

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Tim Duignan

@timothyduignan

6 months ago

Chaitanya K. Joshi Xiang Fu Brandon Wood Nathan C. Frey Michael Bronstein Jason Yim Hannes Stärk Ilyes Batatia Ahmed Elhag Taco Cohen Non equivariant models can perform extremely well as inter atomic potentials. arxiv.org/abs/2504.06231

thumb_up_off_alt7

chat_bubble_outline2

repeat2

shareShare

Aravind Srinivas

@aravsrinivas

6 months ago

Incredible! Congrats Gukesh D!

thumb_up_off_alt6,6K

chat_bubble_outline77

repeat364

shareShare

Miguel Angel Bautista

@itsbautistam

6 months ago

Chaitanya K. Joshi True, I guess “required” was too strong of a choice of wording 🫢. I think nuance is absolutely needed, Id argue is probably what we needed in the first place.

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Kevin K. Yang 楊凱筌

@kevinkaichuang

6 months ago

Do protein language models store different structural elements in factorizable subnetworks? To find out, we masked out PLM weights to suppress performance on CATH subcategories or secondary structure elements while maintaining performance on other sequences or residues.

thumb_up_off_alt146

chat_bubble_outline3

repeat34

shareShare

Yoshua Bengio

@yoshua_bengio

6 months ago

Today marks a big milestone for me. I'm launching LawZero - LoiZéro, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.

thumb_up_off_alt575

chat_bubble_outline40

repeat107

shareShare