Tim Lau (@timlautk) 's Twitter Profile
Tim Lau

@timlautk

Postdoc @Penn @PennMedicine @Wharton; Past Postdoc @ChicagoBooth; PhD @NorthwesternU Statistics & Data Science

ID: 2276510935

linkhttp://timlautk.github.io calendar_today04-01-2014 18:50:47

363 Tweet

476 Followers

1,1K Following

Lap Sum Chan (@sumlap) 's Twitter Profile Photo

🚨 Exciting news! 🚨 My first postdoc work on using highly correlated exposures for Mendelian Randomization (MR) is now in press AJHG! 🧬 We developed MVMR-cML-SuSiE and discovered that glutamine and lipids play key roles in Alzheimer's disease. cell.com/ajhg/abstract/… 1/n

Weijie Su (@weijie444) 's Twitter Profile Photo

10 years ago, ML papers were math-heavy. Advice I got: less math, more empirics. Today, many ML/AI papers lack even a single math formula, let alone math thinking. My advice to young LLM researchers: do a little math if possible. It'll distinguish yours from the sea of LLM

Francis Bach (@bachfrancis) 's Twitter Profile Photo

How fast is gradient descent, *for real*? Some (partial) answers in this new blog post on scaling laws for optimization. francisbach.com/scaling-laws-o…

Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

Moreau-Yosida regularization smoothes a function. It is the inf-convolution with a quadratic function. Its gradient is Id - the proximal operator. en.wikipedia.org/wiki/Convex_co…

Moreau-Yosida regularization smoothes a function. It is the inf-convolution with a quadratic function. Its gradient is Id - the proximal operator. en.wikipedia.org/wiki/Convex_co…
Samuel Vaiter (@vaiter) 's Twitter Profile Photo

Łojasiewicz inequality provides a way to control how close points are to the zeros of a real analytic function based on the value of the function itself. Extension of this result to semialgebraic or o-minimal functions exist. matwbn.icm.edu.pl/ksiazki/sm/sm1…

Łojasiewicz inequality provides a way to control how close points are to the zeros of a real analytic function based on the value of the function itself. Extension of this result to semialgebraic or o-minimal functions exist. matwbn.icm.edu.pl/ksiazki/sm/sm1…
Lap Sum Chan (@sumlap) 's Twitter Profile Photo

Day 0 at #IGES24 in Denver—already meeting fantastic people! I’ll be presenting my recent work, MVMR-cML-SuSiE (see last post), tomorrow and sharing a poster at #ASHG24 on Thursday. Looking forward to connecting with more of you this week!

Day 0 at #IGES24 in Denver—already meeting fantastic people! I’ll be presenting my recent work, MVMR-cML-SuSiE (see last post), tomorrow and sharing a poster at #ASHG24 on Thursday. Looking forward to connecting with more of you this week!
Tim Lau (@timlautk) 's Twitter Profile Photo

Decentralized distributed training (of language models) at this scale is incredible. Very excited for how this would inform distributed optimization algorithms and techniques, and the other way round.

Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

Moreau's decomposition generalizes the orthogonal decomposition to general functions. It can also be generalized beyond Euclidean space using Bregman divergences in place of Euclidean distance. hal.archives-ouvertes.fr/hal-01076974/d…

Moreau's decomposition generalizes the orthogonal decomposition to general functions. It can also be generalized beyond Euclidean space using Bregman divergences in place of Euclidean distance. hal.archives-ouvertes.fr/hal-01076974/d…
Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

Oldies but goldies: R. T Rockafellar, Monotone Operators and the Proximal Point Algorithm, 1976. Proximal point algorithm is the most fundamental non-smooth optimization method and is the basis for many other proximal methods (FB, DR, ADMM, Dykstra, etc). en.wikipedia.org/wiki/Proximal_…

Oldies but goldies: R. T Rockafellar, Monotone Operators and the Proximal Point Algorithm, 1976.  Proximal point algorithm is the most fundamental non-smooth optimization method and is the basis for many other proximal methods (FB, DR, ADMM, Dykstra, etc). en.wikipedia.org/wiki/Proximal_…
Gabriel Peyré (@gabrielpeyre) 's Twitter Profile Photo

Oldies but goldies: J-J Moreau, Proximite et dualite dans un espace hilbertien, 1965. Moreau-Yosida regularization smoothes a function by inf-convolution. en.wikipedia.org/wiki/Convex_co…

Oldies but goldies: J-J Moreau, Proximite et dualite dans un espace hilbertien, 1965. Moreau-Yosida regularization smoothes a function by inf-convolution. en.wikipedia.org/wiki/Convex_co…
Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨 New Paper 🚨 An Overview of Large Language Models for Statisticians 📝: arxiv.org/abs/2502.17814 - Dual perspectives on Statistics ➕ LLMs: Stat for LLM & LLM for Stat - Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability,

🚨 New Paper 🚨
An Overview of Large Language Models for Statisticians
📝: arxiv.org/abs/2502.17814

- Dual perspectives on Statistics ➕ LLMs: Stat for LLM & LLM for Stat
- Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability,
The Wharton School (@wharton) 's Twitter Profile Photo

Congratulations to Prof. Weijie Su (@Weijie444) from our Department of Statistics and Data Science on being named an IMS Fellow! #WhartonProud

Weijie Su (@weijie444) 's Twitter Profile Photo

I just wrote a position paper on the relation between statistics and large language models: Do Large Language Models (Really) Need Statistical Foundations? arxiv.org/abs/2505.19145 Any comments are welcome. Thx

Tim Lau (@timlautk) 's Twitter Profile Photo

rohan anil See also my recent paper with Qi Long and Weijie Su on the numerical side using QDWH for polar decomposition arxiv.org/abs/2505.21799. The QDWH-SVD algorithm is included in one of the cited papers by Yuji Nakatsukasa et al. As a side note, it would be really great if QDWH is

Jason Lee (@jasondeanlee) 's Twitter Profile Photo

TLDR: Heuristics such as clipping cause weird biases. Let's move away from heuristics to principled methods so at least we know what they are optimizing

protim (@_proteuss_) 's Twitter Profile Photo

"PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective" x.com/timlautk/statu…