Guilherme Penedo (@gui_penedo) 's Twitter Profile
Guilherme Penedo

@gui_penedo

Pre-training data @huggingface 🤗. Lisboeta 🇵🇹

ID: 547836893

calendar_today07-04-2012 19:07:52

914 Tweet

3,3K Followers

2,2K Following

Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

We've just updated 🍷FineWeb and 📚 FineWeb-Edu with data from all the remaining 2024 CommonCrawl dumps, covering up to December. 🍷FineWeb now has a little over 17 trillion tokens. Fresh data = more useful models. We'll keep it coming.

We've just updated 🍷FineWeb and 📚 FineWeb-Edu with data from all the remaining 2024 CommonCrawl dumps, covering up to December.

🍷FineWeb now has a little over 17 trillion tokens.

Fresh data = more useful models. We'll keep it coming.