
Eniola O.A
@eniolacodes
Software engineer
ID: 1455146106795577344
https://www.eniola.codes/ 01-11-2021 12:14:26
16,16K Tweet
24,24K Followers
1,1K Following



I'm excited to announce the release of the Naijaweb π³π¬ dataset. Naijaweb is a 270,000 (230Million GPT2 tokens) document dataset of webpages which Nigerians have shown interest in, it was cleaned with the same techniques done on the Fineweb dataset by Hugging Face. A thread π§΅..












