Satoki (@sisforcollege) Twitter Tweets • TwiCopy

ICLR 2025

8 months ago

Test of Time Winner Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba Adam revolutionized neural network training, enabling significantly faster convergence and more stable training across a wide variety of architectures and tasks.

thumb_up_off_alt527

chat_bubble_outline3

repeat49

shareShare

myamada0

@myamada0

8 months ago

We’re happy to share that members of our MLDS unit (OIST) will present several papers at #ICLR2025! Topics include brain-inspired representation learning, optimal transport, decentralized learning, anomaly detection, and LLM uncertainty quantification. Feel free to stop by

thumb_up_off_alt28

chat_bubble_outline0

repeat5

shareShare

Torsten Hoefler 🇨🇭

@thoefler

8 months ago

Rio Yokota from Tokyo Tech talks about scaling laws for #HPC, #AI training, inference, and spending 💸. We're in the exponential scaling part of a logistic curve - when will we hit the bottom? Nice discussion and analogies between the fields 🤔.

thumb_up_off_alt14

chat_bubble_outline0

repeat5

shareShare

Satoki

@sisforcollege

7 months ago

In this paper, they are experimenting with the combination of muon and muP, but since muon is mathematically equivalent to Shampoo, the muP of muon should correspond to the muP of Shampoo in our paper. arxiv.org/abs/2505.02222 muP for Shampoo arxiv.org/abs/2312.12226

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Satoki

@sisforcollege

7 months ago

アファナシエフのコンサートでの空気感をそのままに感じられる素晴らしいショパンのマズルカ．この自由で瞑想的な演奏をあのアファナシエフの重力場の中でまた聞きたいですが，もう来日は叶いませんかね．．． youtube.com/watch?v=8T1sID…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Anil Seth

@anilkseth

7 months ago

1/3 Geoffrey Hinton once said that the future depends on some graduate student being suspicious of everything he says (via Lex Fridman). He also said was that it was impossible to find biologically plausible approaches to backprop that scale well: radical.vc/geoffrey-hinto….

1/3 <a href="/geoffreyhinton/">Geoffrey Hinton</a> once said that the future depends on some graduate student being suspicious of everything he says (via <a href="/lexfridman/">Lex Fridman</a>). He also said was that it was impossible to find biologically plausible approaches to backprop that scale well: radical.vc/geoffrey-hinto….

thumb_up_off_alt980

chat_bubble_outline17

repeat108

shareShare

Satoki

@sisforcollege

6 months ago

The technical paper for Gemini 2.5 mention improvements in “signal propagation” and “optimization dynamics.” Those terms make it sound like theoretical insights have been applied, and if so, I’d be very curious to learn exactly what those insights are. storage.googleapis.com/deepmind-media…

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Information Geometry

@sn_inge

6 months ago

BREAKING NEWS Congratulations to Professor Shun-ichi Amari! 2025 Kyoto Prize Laureates kyotoprize.org/en/laureates/s…

thumb_up_off_alt485

chat_bubble_outline0

repeat170

shareShare

Satoki

@sisforcollege

6 months ago

I find a very interesting μP paper on the embedding LR. They propose new embedding LR scale when vocab size is much larger than width. arxiv.org/abs/2506.15025

thumb_up_off_alt67

chat_bubble_outline0

repeat12

shareShare

Soumith Chintala

@soumithchintala

5 months ago

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

thumb_up_off_alt845

chat_bubble_outline31

repeat62

shareShare

Taishi Nakamura@ICLR2025🇸🇬

@setuna7777_2

5 months ago

I won’t make it to ICML this year, but our work will be presented at the 2nd AI for Math Workshop @ ICML 2025 (AI for Math Workshop @ ICML 2025). Huge thanks to my co‑author Satoki Ishikawa for presenting on my behalf. please drop by if you’re around!

I won’t make it to ICML this year, but our work will be presented at the 2nd AI for Math Workshop @ ICML 2025 (<a href="/ai4mathworkshop/">AI for Math Workshop @ ICML 2025</a>).
Huge thanks to my co‑author <a href="/SisForCollege/">Satoki Ishikawa</a> for presenting on my behalf. please drop by if you’re around!

thumb_up_off_alt46

chat_bubble_outline1

repeat8

shareShare

Satoki

@sisforcollege

3 months ago

ACT-X「次世代AIを築く数理・情報科学の革新」に採択されました．引き続き，ニューラルネットワークの最適化について，深い理解を得られるよう研究していきます😁

thumb_up_off_alt62

chat_bubble_outline3

repeat4

shareShare

Satoki

@sisforcollege

3 months ago

I'm updating awesome-second-order optimization. If you find important / interesting papers not cited in this repository, please let me know. github.com/riverstone496/…

thumb_up_off_alt12

chat_bubble_outline2

repeat3

shareShare

Andrew Gordon Wilson

@andrewgwils

a month ago

Bach is so timeless because he wasn't writing for people, he was writing for a higher power. Try writing your next paper for God. Imagine how many rubbish papers we wouldn't see anymore. Your audience sees your every thought and intention. There would be no ego, no pretense.

thumb_up_off_alt187

chat_bubble_outline5

repeat11

shareShare