Kwangjun Ahn (@kwangjuna) Twitter Tweets • TwiCopy

Kwangjun Ahn

@kwangjuna

+ Follow

Senior Researcher at Microsoft Reserach // PhD from MIT EECS

ID: 1229766355622055936

linkhttp://kjahn.mit.edu/ calendar_today18-02-2020 13:56:04

40 Tweet

550 Followers

266 Following

Sebastien Bubeck

@sebastienbubeck

2 years ago

To gain insights we study the simplest possible toy model, a baby sparse coding problem: Covariate x \in R^d is white noise plus a spike y \in R in a random coordinate. Goal: predict y given x. To solve the task a neural net has to learn threshold units for each coordinate. 3/8

$To gain insights we study the simplest possible toy model, a baby sparse coding problem: Covariate x \in R^d is white noise plus a spike y \in R in a random coordinate. Goal: predict y given x. To solve the task a neural net has to learn threshold units for each coordinate. 3/8$

thumb_up_off_alt9

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

2 years ago

Experimental results for this baby sparse coding problem are striking: bias term starts moving ONLY at large learning rate! Moreover, something else happens at large lr: suddenly the training loss starts to oscillate! (Note: oscillations are well-documented in empirical DL.) 4/8

Experimental results for this baby sparse coding problem are striking: bias term starts moving ONLY at large learning rate! Moreover, something else happens at large lr: suddenly the training loss starts to oscillate! (Note: oscillations are well-documented in empirical DL.) 4/8

thumb_up_off_alt10

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

2 years ago

Coud it be that the emergence of the threshold units are related to these oscillations? Oscillations themselves have recently been under intense scrutiny by theoreticians under the name "Edge of Stability", a beautiful phenomenon discovered by Jeremy Cohen and co-authors. 5/8

Coud it be that the emergence of the threshold units are related to these oscillations? Oscillations themselves have recently been under intense scrutiny by theoreticians under the name "Edge of Stability", a beautiful phenomenon discovered by <a href="/deepcohen/">Jeremy Cohen</a> and co-authors. 5/8

thumb_up_off_alt10

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

2 years ago

Indeed, this is exactly what our paper does: we directly connect the emergence of threshold units to the Edge of Stability phenomenon. What comes next in this story does not fit well in tweet format, I guess that's why there is a paper :-). 6/8

thumb_up_off_alt6

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

2 years ago

Key highlight of our story: we discover a phase transition for neural network learning at lr 8π/d^2. Emergence (for baby sparse coding) happens iff lr > 8π/d^2 ... Of course, the bigger story about general-purpose circuits remains fully open. We just made a tiny step. 7/8

Key highlight of our story: we discover a phase transition for neural network learning at lr 8π/d^2. Emergence (for baby sparse coding) happens iff lr > 8π/d^2 ...

Of course, the bigger story about general-purpose circuits remains fully open. We just made a tiny step. 7/8

thumb_up_off_alt13

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

2 years ago

Project was led by three incredible MIT students, Kwangjun Ahn, Sinho Chewi and Felipe Suárez Colmenares. I cannot recommend them strongly enough. Project went so far beyond what I expected to be true at the beginning, let alone what would be *provable*. Such a pleasure to work with them. 8/8

thumb_up_off_alt20

chat_bubble_outline1

Francesco Orabona

a year ago

Parameter-free optimization: 11 years of research and almost no code... So, I wrote a PyTorch library and to add all the parameter-free algos I know! Currently with COCOB and KT: old but sometimes even better than some newer variants 😉 github.com/bremen79/param… Please retweet!

Parameter-free optimization: 11 years of research and almost no code...

So, I wrote a PyTorch library and to add all the parameter-free algos I know!

Currently with COCOB and KT: old but sometimes even better than some newer variants 😉

github.com/bremen79/param…

Please retweet!

thumb_up_off_alt298

chat_bubble_outline3

Kwangjun Ahn

a year ago

AI Model is not Gymnastics friendly... OpenAI

AI Model is not Gymnastics friendly... <a href="/OpenAI/">OpenAI</a>

thumb_up_off_alt3

chat_bubble_outline0

Damek Davis

a year ago

Morning reading☕️.

Morning reading☕️.

thumb_up_off_alt188

chat_bubble_outline7

Kangwook Lee

a year ago

Damek Davis youtu.be/0tYpMncAKFs?fe… In case if you haven't watched it yet :)

thumb_up_off_alt5

chat_bubble_outline1

Sebastien Bubeck

@sebastienbubeck

10 months ago

My group is hiring a large cohort of interns for the summer of 2024 to work on the Foundations of Large Language Models! Come help us uncover the new physics of A.I. to improve the LLM building practices! (Pic below from our NeurIPS 2023 paper w. interns) jobs.careers.microsoft.com/global/en/job/…

My group is hiring a large cohort of interns for the summer of 2024 to work on the Foundations of Large Language Models! Come help us uncover the new physics of A.I. to improve the LLM building practices! (Pic below from our NeurIPS 2023 paper w. interns)

jobs.careers.microsoft.com/global/en/job/…

thumb_up_off_alt397

chat_bubble_outline10

Kwangjun Ahn

10 months ago

Excited to share our NeurIPS paper on (theoretically) understanding in-context learning based on linear transformers! Please check out the details in arxiv.org/abs/2306.00297

Excited to share our NeurIPS paper on (theoretically) understanding in-context learning based on linear transformers! Please check out the details in arxiv.org/abs/2306.00297

thumb_up_off_alt99

chat_bubble_outline1

Kwangjun Ahn

10 months ago

Excited to share our NeurIPS paper that Sebastien Bubeck mentioned in his post: arxiv.org/abs/2212.07469 Also check out a NeurIPS paper on understanding SAM (a companion paper!) arxiv.org/abs/2305.15287 My talk video from INFORMS about these works: youtu.be/TMmpeVBbD7o?si…

Excited to share our NeurIPS paper that <a href="/SebastienBubeck/">Sebastien Bubeck</a> mentioned in his post: arxiv.org/abs/2212.07469
Also check out a NeurIPS paper on understanding SAM (a companion paper!) arxiv.org/abs/2305.15287

My talk video from INFORMS about these works: youtu.be/TMmpeVBbD7o?si…

thumb_up_off_alt52

chat_bubble_outline0

Kwangjun Ahn

10 months ago

Check out Xiang Cheng’s talk on our linear transformer works given at Simons Institute!! youtube.com/live/PnwC74s1n…

thumb_up_off_alt51

chat_bubble_outline0

Ahmad Beirami

9 months ago

If you're at #NeurIPS2023, Kwangjun Ahn will be presenting his work on SpecTr++ in Optimal Transport workshop where he discusses improved transport plans for speculative decoding.

thumb_up_off_alt17

chat_bubble_outline0

Aaron Defazio

4 months ago

Exciting new paper by Kwangjun Ahn (Kwangjun Ahn) and Ashok Cutkosky (Ashok Cutkosky)! Adam with model exponential moving average is effective for nonconvex optimization arxiv.org/pdf/2405.18199 This approach to analyzing Adam is extremely promising IMHO.

thumb_up_off_alt90

chat_bubble_outline3

Kwangjun Ahn

3 months ago

In our ICML 2024 paper (ICML Conference), joint w/ Zhiyu Zhang (Zhiyu Zhang), Yunbum Kook, Yan Dai, we provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

In our ICML 2024 paper (<a href="/icmlconf/">ICML Conference</a>), joint w/ Zhiyu Zhang (<a href="/imZhiyuZ/">Zhiyu Zhang</a>), Yunbum Kook, Yan Dai, we provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

thumb_up_off_alt70

chat_bubble_outline0

Sham Kakade

2 months ago

What's the opt optimizer? New work comparing (diagonally conditioned) first order methods.

thumb_up_off_alt7

chat_bubble_outline0

Kwangjun Ahn

2 months ago

Come to my presentation of ICML 2024 paper tmrw at 1:30–3 pm! We provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

Come to my presentation of ICML 2024 paper tmrw at 1:30–3 pm!
We provide a new perspective on Adam optimizer based on online learning. In particular, our perspective shows the importance of Adam's key components. (video: youtu.be/AU39SNkkIsA)

thumb_up_off_alt84

chat_bubble_outline0