Alireza Mousavi @ ICLR 2025 (@alirezamh_) 's Twitter Profile
Alireza Mousavi @ ICLR 2025

@alirezamh_

CS PhD student @UofT and @VectorInst. Interested in deep learning theory.

ID: 1844793221240483848

linkhttps://mousavih.github.io calendar_today11-10-2024 17:32:44

28 Tweet

215 Followers

183 Following

MTL MLOpt (@mtl_mlopt) 's Twitter Profile Photo

Join us on Wednesday, November 13th, at 12:30 PM EDT for a talk by Alireza Mousavi (UofT) on "Learning and Optimization with Mean-Field Langevin Dynamics" at Mila - Institut québécois d'IA in Montreal

Join us on Wednesday, November 13th, at 12:30 PM EDT for a talk by <a href="/alirezamh_/">Alireza Mousavi</a> (UofT) on "Learning and Optimization with Mean-Field Langevin Dynamics" at <a href="/Mila_Quebec/">Mila - Institut québécois d'IA</a> in Montreal
Alireza Mousavi @ ICLR 2025 (@alirezamh_) 's Twitter Profile Photo

So can someone ask o3 to prove that with high probability over initialization, gradient descent on ResNet and CIFAR10 converges to 0 loss?

Vector Institute (@vectorinst) 's Twitter Profile Photo

Congratulations to Vector-affiliated researchers Alireza Mousavi-Hosseini and Mohammed Adnan, who were named RBC Borealis Fellows. The RBC Borealis Fellowships Program represents excellence in Canadian AI research and innovation. We’re proud to see our affiliated researchers

Eshaan Nichani (@eshaannichani) 's Twitter Profile Photo

Excited to announce a new paper with Yunwei Ren, Denny Wu, Jason Lee! We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks. arxiv.org/abs/2504.19983 🧵below (1/10)

Excited to announce a new paper with Yunwei Ren, Denny Wu, <a href="/jasondeanlee/">Jason Lee</a>!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)
Jason Lee (@jasondeanlee) 's Twitter Profile Photo

New work arxiv.org/abs/2506.05500 on learning multi-index models with Alex Damian and Joan Bruna. Multi-index are of the form y= g(Ux), where U=r by d maps from d dimension to r dimension and d>>r. g is an arbitrary function. Examples of multi-index models are any neural net

masani (@mohammadhamani) 's Twitter Profile Photo

Why does RL struggle with tasks requiring long reasoning chains? Because “bumping into” a correct solution becomes exponentially less likely as the number of reasoning steps grows. We propose an adaptive backtracking algorithm: AdaBack. 1/n

Bruno Mlodozeniec (@kayembruno) 's Twitter Profile Photo

NeurIPS Conference, why take the option to provide figures in the rebuttals away from the authors during the rebuttal period? Grounding the discussion in hard evidential data (like plots) makes resolving disagreements much easier for both the authors and the reviewers. Left: NeurIPS

<a href="/NeurIPSConf/">NeurIPS Conference</a>, why take the option to provide figures in the rebuttals away from the authors during the rebuttal period? Grounding the discussion in hard evidential data (like plots) makes resolving disagreements much easier for both the authors and the reviewers.

Left: NeurIPS