uonlp/CulturaX
Text GenerationFill MaskAF, ALS, AM
Created by uonlp at 2023, the uonlp/CulturaX is a text generation dataset in AF, ALS, AM in Parquet format.
About uonlp/CulturaX
CulturaX
Cleaned, Enormous, and Public: The Multilingual Fuel to Democratize Large Language Models for 167 Languages
Dataset Summary
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 lan...
Details
- Task
- Text Generation, Fill Mask
- Language
- AF, ALS, AM
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- uonlp
- Year
- 2023