saheedniyi/naijaweb
Text GenerationEN, YO, HA
Created by saheedniyi at 2024, the saheedniyi/naijaweb is a text generation dataset in EN, YO, HA in Parquet format.
About saheedniyi/naijaweb
Naijaweb Dataset 🇳🇬
Naijaweb is a dataset that contains over 270,000+ documents, totaling approximately 230 million GPT-2 tokens. The data was web scraped from web pages popular among Nigerians, providing a rich resource for modeling Nigerian l...
Details
- Task
- Text Generation
- Language
- EN, YO, HA
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- saheedniyi
- Year
- 2024