Satoki (@sisforcollege) 's Twitter Profile
Satoki

@sisforcollege

TokyoTech 25D Dept. of Computer Science / DNN optimization site: riverstone496.github.io

ID: 1032205443878141954

calendar_today22-08-2018 09:58:50

636 Tweet

462 Takipçi

867 Takip Edilen

ICLR 2025 (@iclr_conf) 's Twitter Profile Photo

Test of Time Winner Adam: A Method for Stochastic Optimization Diederik P. Kingma, Jimmy Ba Adam revolutionized neural network training, enabling significantly faster convergence and more stable training across a wide variety of architectures and tasks.

myamada0 (@myamada0) 's Twitter Profile Photo

We’re happy to share that members of our MLDS unit (OIST) will present several papers at #ICLR2025! Topics include brain-inspired representation learning, optimal transport, decentralized learning, anomaly detection, and LLM uncertainty quantification. Feel free to stop by

Torsten Hoefler 🇨🇭 (@thoefler) 's Twitter Profile Photo

Rio Yokota from Tokyo Tech talks about scaling laws for #HPC, #AI training, inference, and spending 💸. We're in the exponential scaling part of a logistic curve - when will we hit the bottom? Nice discussion and analogies between the fields 🤔.

Rio Yokota from Tokyo Tech talks about scaling laws for #HPC, #AI training, inference, and spending 💸. We're in the exponential scaling part of a logistic curve - when will we hit the bottom? Nice discussion and analogies between the fields 🤔.
Satoki (@sisforcollege) 's Twitter Profile Photo

In this paper, they are experimenting with the combination of muon and muP, but since muon is mathematically equivalent to Shampoo, the muP of muon should correspond to the muP of Shampoo in our paper. arxiv.org/abs/2505.02222 muP for Shampoo arxiv.org/abs/2312.12226

Satoki (@sisforcollege) 's Twitter Profile Photo

アファナシエフのコンサートでの空気感をそのままに感じられる素晴らしいショパンのマズルカ.この自由で瞑想的な演奏をあのアファナシエフの重力場の中でまた聞きたいですが,もう来日は叶いませんかね... youtube.com/watch?v=8T1sID…

Anil Seth (@anilkseth) 's Twitter Profile Photo

1/3 Geoffrey Hinton once said that the future depends on some graduate student being suspicious of everything he says (via Lex Fridman). He also said was that it was impossible to find biologically plausible approaches to backprop that scale well: radical.vc/geoffrey-hinto….

1/3 <a href="/geoffreyhinton/">Geoffrey Hinton</a> once said that the future depends on some graduate student being suspicious of everything he says (via <a href="/lexfridman/">Lex Fridman</a>). He also said was that it was impossible to find biologically plausible approaches to backprop that scale well: radical.vc/geoffrey-hinto….
Satoki (@sisforcollege) 's Twitter Profile Photo

The technical paper for Gemini 2.5 mention improvements in “signal propagation” and “optimization dynamics.” Those terms make it sound like theoretical insights have been applied, and if so, I’d be very curious to learn exactly what those insights are. storage.googleapis.com/deepmind-media…

The technical paper for Gemini 2.5 mention improvements in “signal propagation” and “optimization dynamics.” Those terms make it sound like theoretical insights have been applied, and if so, I’d be very curious to learn exactly what those insights are.
storage.googleapis.com/deepmind-media…
Satoki (@sisforcollege) 's Twitter Profile Photo

I find a very interesting μP paper on the embedding LR. They propose new embedding LR scale when vocab size is much larger than width. arxiv.org/abs/2506.15025

Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…

considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default.
If anyone wants to take a crack at it... 
github.com/pytorch/pytorc…
Taishi Nakamura@ICLR2025🇸🇬 (@setuna7777_2) 's Twitter Profile Photo

I won’t make it to ICML this year, but our work will be presented at the 2nd AI for Math Workshop @ ICML 2025 (AI for Math Workshop @ ICML 2025). Huge thanks to my co‑author Satoki Ishikawa for presenting on my behalf. please drop by if you’re around!

I won’t make it to ICML this year, but our work will be presented at the 2nd AI for Math Workshop @ ICML 2025 (<a href="/ai4mathworkshop/">AI for Math Workshop @ ICML 2025</a>).
Huge thanks to my co‑author <a href="/SisForCollege/">Satoki Ishikawa</a> for presenting on my behalf. please drop by if you’re around!
Satoki (@sisforcollege) 's Twitter Profile Photo

ACT-X「次世代AIを築く数理・情報 科学の革新」に採択されました.引き続き,ニューラルネットワークの最適化について,深い理解を得られるよう研究していきます😁

Satoki (@sisforcollege) 's Twitter Profile Photo

I'm updating awesome-second-order optimization. If you find important / interesting papers not cited in this repository, please let me know. github.com/riverstone496/…

Andrew Gordon Wilson (@andrewgwils) 's Twitter Profile Photo

Bach is so timeless because he wasn't writing for people, he was writing for a higher power. Try writing your next paper for God. Imagine how many rubbish papers we wouldn't see anymore. Your audience sees your every thought and intention. There would be no ego, no pretense.