Mathieu Dagréou (@mat_dag) Twitter Tweets • TwiCopy

Samuel Vaiter

a year ago

There exists f:[0,1]→[0,1] strictly increasing, continuous function such that its derivative is 0 almost everywhere. jstor.org/stable/2978047…

thumb_up_off_alt367

chat_bubble_outline3

repeat40

shareShare

🏆 #Distinction | Toutes nos félicitations à Gérard Biau (Centre Inria Sorbonne Université), directeur de #SCAI et spécialiste des dynamiques statistiques dans les algorithmes d'IA, qui a été élu à l’Académie des sciences 👏. sorbonne-universite.fr/presse/gerard-…

🏆 #Distinction | Toutes nos félicitations à <a href="/gerardbiau/">Gérard Biau</a> (Centre <a href="/Inria/">Inria</a> <a href="/Sorbonne_Univ_/">Sorbonne Université</a>), directeur de #SCAI et spécialiste des dynamiques statistiques dans les algorithmes d'IA, qui a été élu à l’<a href="/AcadSciences/">Académie des sciences</a> 👏.
sorbonne-universite.fr/presse/gerard-…

thumb_up_off_alt22

chat_bubble_outline0

repeat8

shareShare

Pierre Ablin

@pierreablin

a year ago

🍏🍏🍏 Come work with us at Apple Machine Learning Research! 🍏🍏🍏 Our team focuses on curiosity-based, open research. We work on several topics, including LLMs, optimization, optimal transport, uncertainty quantification, and generative modeling. Infos 👇

thumb_up_off_alt382

chat_bubble_outline3

repeat39

shareShare

Samuel Vaiter

@vaiter

a year ago

When optimization problems have multiple minima, algorithms favor specific solutions due to their implicit bias. For ordinary least squares (OLS), gradient descent inherently converges to the minimal norm solution among all possible solutions. fa.bianp.net/blog/2022/impl…

thumb_up_off_alt356

chat_bubble_outline7

repeat50

shareShare

Francis Bach

@bachfrancis

a year ago

My book is (at last) out, just in time for Christmas! A blog post to celebrate and present it: francisbach.com/my-book-is-out/

thumb_up_off_alt2,2K

chat_bubble_outline32

repeat327

shareShare

Gabriel Peyré

@gabrielpeyre

10 months ago

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

thumb_up_off_alt2,2K

chat_bubble_outline27

repeat439

shareShare

Théo Uscidda

@theo_uscidda

10 months ago

Our work on geometric disentangled representation learning has been accepted to ICLR 2025! 🎊See you in Singapore if you want to understand this gif better :)

thumb_up_off_alt154

chat_bubble_outline0

repeat18

shareShare

Konstantin Mishchenko

@konstmish

9 months ago

Learning rate schedulers used to be a big mistery. Now you can just take a guarantee for *convex non-smooth* problems (from arxiv.org/abs/2310.07831), and they give you *precisely* what you see in training large models. See this empirical study: arxiv.org/abs/2501.18965 1/3

thumb_up_off_alt440

chat_bubble_outline5

repeat75

shareShare

Fabian Schaipp

@fschaipp

9 months ago

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965

thumb_up_off_alt131

chat_bubble_outline5

repeat26

shareShare

Alex Hägele

@haeggee

9 months ago

A really fun project to work on. Looking at these plots side-by-side still amazes me! How well can **convex optimization theory** match actual LLM runs? My favorite points of our paper on the agreement for LR schedules in theory and practice: 1/n

thumb_up_off_alt43

chat_bubble_outline1

repeat5

shareShare

Gabriel Peyré

@gabrielpeyre

9 months ago

Optimization algorithms come with many flavors depending on the structure of the problem. Smooth vs non-smooth, convex vs non-convex, stochastic vs deterministic, etc. en.wikipedia.org/wiki/Mathemati…

thumb_up_off_alt506

chat_bubble_outline3

repeat100

shareShare

Mathurin Massias

@mathusmassias

7 months ago

It was received quite enthusiastically here so time to share it again!!! Our #ICLR2025 blog post on Flow M atching was published yesterday : iclr-blogposts.github.io/2025/blog/cond… My PhD student Anne Gagneux will present it tomorrow in ICLR, 👉poster session 4, 3 pm, #549 in Hall 3/2B 👈

thumb_up_off_alt11

chat_bubble_outline1

repeat5

shareShare

Samuel Vaiter

@vaiter

5 months ago

📣 New preprint 📣 **Differentiable Generalized Sliced Wasserstein Plans** w/ L. Chapel Romain Tavenard We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation. arxiv.org/abs/2505.22049 1/5

📣 New preprint 📣

**Differentiable Generalized Sliced Wasserstein Plans**

w/
L. Chapel
<a href="/rtavenar/">Romain Tavenard</a>

We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation.

arxiv.org/abs/2505.22049 1/5

thumb_up_off_alt31

chat_bubble_outline1

repeat5

shareShare

Matthieu Terris

@matthieuterris

5 months ago

🧵 I'll be at CVPR next week presenting our FiRe work 🔥 TL;DR: We go beyond denoising models in PnP with more general restoration (e.g. deblurring) models! A starting point observation is that images are not fixed-points of restoration models:

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Waïss Azizian

@wazizian

5 months ago

❓ How long does SGD take to reach the global minimum on non-convex functions? With Franck Iutzeler, J. Malick, P. Mertikopoulos, we tackle this fundamental question in our new ICML 2025 paper: "The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes"

thumb_up_off_alt146

chat_bubble_outline5

repeat20

shareShare

Konstantin Mishchenko

@konstmish

5 months ago

I want to address one very common misconception about optimization. I often hear that (approximately) preconditioning with the Hessian diagonal is always a good thing. It's not. In fact, finding a good preconditioner is an open problem, which I think deserves more attention. 1/4

thumb_up_off_alt154

chat_bubble_outline5

repeat12

shareShare

Mathieu Blondel

@mblondel_ml

4 months ago

Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here github.com/diffprog/slide…

thumb_up_off_alt71

chat_bubble_outline3

repeat20

shareShare

Rudy Morel

@rdmorel

4 months ago

For evolving unknown PDEs, ML models are trained on next-state prediction. But do they actually learn the time dynamics: the "physics"? Check out our poster (W-107) at #ICML2025 this Wed, Jul 16. Our "DISCO" model learns the physics while staying SOTA on next states prediction!

thumb_up_off_alt301

chat_bubble_outline4

repeat50

shareShare

Fabian Schaipp

@fschaipp

2 months ago

🚟 New blog post: On "infinite" learning-rate schedules and how to construct them from one checkpoint to the next fabian-sp.github.io/posts/2025/09/…

thumb_up_off_alt80

chat_bubble_outline1

repeat13

shareShare

Konstantin Mishchenko

@konstmish

a month ago

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent. The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth+nonsmooth satisfies neither property, so what can we do?

thumb_up_off_alt472

chat_bubble_outline11

repeat55

shareShare

Mathieu Dagréou

Samuel Vaiter

Centre Inria de Paris

Pierre Ablin

Samuel Vaiter

Francis Bach

Gabriel Peyré

Théo Uscidda

Konstantin Mishchenko

Fabian Schaipp

Alex Hägele

Gabriel Peyré

Mathurin Massias

Samuel Vaiter

Matthieu Terris

Waïss Azizian

Konstantin Mishchenko

Mathieu Blondel

Rudy Morel

Fabian Schaipp

Konstantin Mishchenko