Fabian Schaipp (@fschaipp) 's Twitter Profile
Fabian Schaipp

@fschaipp

working on optimization for machine learning. currently postdoc @inria_paris. sbatch and apero.

ID: 1286578476326236162

linkhttps://fabian-sp.github.io/ calendar_today24-07-2020 08:27:46

384 Tweet

929 Followers

627 Following

Fabian Schaipp (@fschaipp) 's Twitter Profile Photo

🚟 New blog post: On "infinite" learning-rate schedules and how to construct them from one checkpoint to the next fabian-sp.github.io/posts/2025/09/…

Alex Hägele (@haeggee) 's Twitter Profile Photo

Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...
Ingwar Perowanowitsch (@perowinger94) 's Twitter Profile Photo

Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“ Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen: „Helm

Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie SchulstraĂźen wie in Paris sieht sie skeptisch. DafĂĽr ihr Rat an alle Kinder und Jugendlichen:

„Helm
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

If you’re scrambling a last-minute submission with an uncertain result, remember: putting it off is hard in the moment. It will sting for 10 minutes (because you care so deeply), but in 10 months you’ll be incredibly proud you made the scientifically rigorous call.

Andrei Semenov (@andreisemenov17) 's Twitter Profile Photo

Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods arxiv.org/pdf/2510.19376

Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods
arxiv.org/pdf/2510.19376