Leandro von Werra (@lvwerra) 's Twitter Profile
Leandro von Werra

@lvwerra

Head of research @huggingface

ID: 1105171148382326789

linkhttp://www.github.com/lvwerra calendar_today11-03-2019 18:18:29

1,1K Tweet

8,8K Takipçi

360 Takip Edilen

Joël Niklaus (@joelniklaus) 's Twitter Profile Photo

✨ Very happy to announce I'm joining Hugging Face next week! 🤗 I'm based in the amazing Bern office 🇨🇭🏔️ alongside Leandro von Werra Lewis Tunstall Andi Marafioti working with the Language Model Dataset team to champion open datasets and models while supporting the open-science community.

Jaisidh Singh (@jaisidhsingh) 's Twitter Profile Photo

The Ultra-Scale Playbook is REALLY good. Gives clear & specific info about LLM training tricks at scale as consecutive optimisation cascades. Gets a flow going. Some notes I made so far (very chill, non-rigorous, basically patching pre-existing info in my head):

The Ultra-Scale Playbook is REALLY good. Gives clear & specific info about LLM training tricks at scale as consecutive optimisation cascades. Gets a flow going.

Some notes I made so far (very chill, non-rigorous, basically patching pre-existing info in my head):
Leandro von Werra (@lvwerra) 's Twitter Profile Photo

MoE Sparsity over time: interesting to see the field starting quite conservative and then pushing more and more sparsity. Some observations: > Early models like Mixtral, DBRX, and Grok-1 went for ~25% sparsity across model sizes. > Llama 4 Scout and Maverick had the same

MoE Sparsity over time: interesting to see the field starting quite conservative and then pushing more and more sparsity. 

Some observations:

> Early models like Mixtral, DBRX, and Grok-1 went for ~25% sparsity across model sizes. 

> Llama 4 Scout and Maverick had the same
Thibaud Frere (@thibaudfrere) 's Twitter Profile Photo

🎉 New chapter: just started at Hugging Face! Excited to help science teams bring their ML research to life through interactive demos and visualizations. Ready to dive in!

Thibaud Frere (@thibaudfrere) 's Twitter Profile Photo

🚀 First week at Hugging Face: shipped improvements to The Ultra Scale Playbook! ✅ Dark mode ✅ Some mobile responsiveness ✅ Performance fixes Your complete guide to scaling #LLMs in 2025 👇 huggingface.co/spaces/nanotro…

Leandro von Werra (@lvwerra) 's Twitter Profile Photo

AI for Science is one of the most exciting applications for AI. Accelerating the progress of science has potentially a far greater positive impact for society than yet another chatbot. There are so many startups popping up in the last 12 months tackling the most challenging and

Leandro von Werra (@lvwerra) 's Twitter Profile Photo

The Ultra-Scale Playbook just got a beautiful update! - official dark mode - mobile compatible - more responsive overall You can order it as a book or get the PDF with a pro subscription. huggingface.co/spaces/nanotro… Not a dark mode fan but can't deny that it looks really cool:

The Ultra-Scale Playbook just got a beautiful update!

- official dark mode 
- mobile compatible
- more responsive overall

You can order it as a book or get the PDF with a pro subscription.

huggingface.co/spaces/nanotro…

Not a dark mode fan but can't deny that it looks really cool:
Hanna Yukhymenko (@a_yukh) 's Twitter Profile Photo

FineWeb has much more use cases than you can imagine! I would highlight the impact on multilingual training - we used FineWeb2 for first (and future👀) Ukrainian-focused MamayLM releases and it helped a lot🤗 Thanks to FW2 we will 100% see more multilingual models in the future!

Leandro von Werra (@lvwerra) 's Twitter Profile Photo

LLM pretraining data processing has evolved over time: First: the more data the better with some light heuristic filtering (C4, gopher, redpajama, Pile, FineWeb, Dolma) Recently: remove all the bad data very aggressively with LLM classifiers but in return replay to train longer

Crystal (@crystalsssup) 's Twitter Profile Photo

Kimi's founder, Zhilin Yang's interview is out. Again, you can let Kimi translate for you: ) lots of insights there. mp.weixin.qq.com/s/uqUGwJLO30mR… Several takes: 1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal

Kimi's founder, Zhilin Yang's interview is out.
Again, you can let Kimi translate for you: ) lots of insights there. 
mp.weixin.qq.com/s/uqUGwJLO30mR…

Several takes:

1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal