Chaitanya K. Joshi @ICLR2025 🇸🇬 (@chaitjo) 's Twitter Profile
Chaitanya K. Joshi @ICLR2025 🇸🇬

@chaitjo

PhD student @Cambridge_CL. Research Intern @AIatMeta and @PrescientDesign. Interested in Deep Learning for biomolecule design. Organising @LoGConference.

ID: 166216664

linkhttps://www.chaitjo.com/ calendar_today13-07-2010 16:36:04

2,2K Tweet

7,7K Followers

1,1K Following

Chaitanya K. Joshi @ICLR2025 🇸🇬 (@chaitjo) 's Twitter Profile Photo

After a long hiatus, I've started blogging again! My first post was a difficult one to write, because I don't want to keep repeating what's already in papers. I tried to give some nuanced and (hopefully) fresh takes on equivariance and geometry in molecular modelling.

After a long hiatus, I've started blogging again!

My first post was a difficult one to write, because I don't want to keep repeating what's already in papers.

I tried to give some nuanced and (hopefully) fresh takes on equivariance and geometry in molecular modelling.
Soledad Villar (@soledadvillar5) 's Twitter Profile Photo

Chaitanya K. Joshi It's possible that data aug + simpler model outperforms SE(3) or E(3) equivariance in practice. The effective sample complexity gain is a constant independent of the input size. But the learning a permutation symmetry from data will never perform better IMO (unless data is tiny)

Petar Veličković (@petarv_93) 's Twitter Profile Photo

A great post as always from Chaitanya, pls go and read it 👇 It also made me reflect on a point... we've (as a field) done the equivariant approach a disservice by calling a model 'equivariant' only if it has 3D spatial symmetries -- and this is a (relatively) simple action

Michael Bronstein @ICLR2025 🇸🇬 (@mmbronstein) 's Twitter Profile Photo

Petar Veličković Transformers are also equivariant. In general, small/low-dimensional groups such as SE(3) or T(2) are easy to learn. Large groups like Sn are hopeless

Thomas Kipf (@tkipf) 's Twitter Profile Photo

Soledad Villar Chaitanya K. Joshi Whether or how learning a permutation symmetry through data augmentation can be beneficial depends heavily on the setup. Some positive examples are set prediction tasks, for which the permutation symmetric solution (set prediction architecture + permutation symmetric loss) often

Thomas Kipf (@tkipf) 's Twitter Profile Photo

Petar Veličković Michael Bronstein Pix2Seq (set prediction tasks randomly serialized into sequence predicted using autoregressive model) vs DETR (set prediction architecture and loss) comparison is a good case study for why even baking in permutation symmetry into architecture and loss is not necessarily what

Max Zhdanov (@maxxxzdn) 's Twitter Profile Photo

Thomas Kipf Petar Veličković Michael Bronstein Recent works in point cloud processing do achieve state-of-the-art performance while breaking permutation symmetry to impose regular structure on unordered sets for better scaling (e.g. PointTransformer v3 arxiv.org/abs/2312.10035 or Erwin arxiv.org/abs/2502.17019).

Ilyes Batatia (@ilyesbatatia) 's Twitter Profile Photo

Chaitanya K. Joshi I agree that strict equivariance isn’t always necessary e.g. in generative diffusion models but it’s essential in others, like ML potentials. The claim that equivariant models are slow doesn’t hold in our field, top models like MACE (equi) and Orb (non-equi) have similar speeds.

Ilyes Batatia (@ilyesbatatia) 's Twitter Profile Photo

Rubén Ballester Petar Veličković Michael Bronstein I often see the claim that “SO(3) is easy to learn because it’s low-dimensional,” but to my knowledge, there’s no empirical or theoretical evidence supporting this. In fact, SO(3) is significantly harder to sample than SO(2), despite being only two dimensions higher.

Hannes Stärk (@hannesstaerk) 's Twitter Profile Photo

Chaitanya K. Joshi Xiang Fu Brandon Wood Nathan C. Frey Michael Bronstein Jason Yim Ilyes Batatia Ahmed Elhag Taco Cohen AF3/Boltz has little uncertainty left after the SE3 invariant trunk. A fun connected visualization is the x0 prediction trajectory. They always look like in this video where the initial x0 prediction is almost the same as at the end of the denoising trajectory.

Ilyes Batatia (@ilyesbatatia) 's Twitter Profile Photo

Michael Bronstein Rubén Ballester Petar Veličković I’d argue that non-equivariant models learn SO(3) well enough for generative modelling, where equivariance accuracy isn’t critical. But there’s evidence that it is hard to reach high equivariance accuracy for force fields. To me "easy to learn" would mean to any given accuracy.

Ilyes Batatia (@ilyesbatatia) 's Twitter Profile Photo

Mark Neumann Chaitanya K. Joshi I agree standardised tests are essential, and timing things can be surprisingly hard. I think we can agree that orb-v3-conservative-inf and MACE are in a similar speed range, meaning they can do a few nanoseconds/day for a few thousand atoms. I can quote directly from your paper.

Miguel Angel Bautista (@itsbautistam) 's Twitter Profile Photo

Chaitanya K. Joshi True, I guess “required” was too strong of a choice of wording 🫢. I think nuance is absolutely needed, Id argue is probably what we needed in the first place.

Kevin K. Yang 楊凱筌 (@kevinkaichuang) 's Twitter Profile Photo

Do protein language models store different structural elements in factorizable subnetworks? To find out, we masked out PLM weights to suppress performance on CATH subcategories or secondary structure elements while maintaining performance on other sequences or residues.

Do protein language models store different structural elements in factorizable subnetworks?

To find out, we masked out PLM weights to suppress performance on CATH subcategories or secondary structure elements while maintaining performance on other sequences or residues.
Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Today marks a big milestone for me. I'm launching LawZero - LoiZéro, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.