statmt/cc100
Text GenerationFill MaskAF, AM, AR
The statmt/cc100 dataset is a AF, AM, AR text generation resource from statmt at 2022.
About statmt/cc100
This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus comprises of monolingual data for 100+ languages and also includes data for romanized languages (indicated by *_rom). This was constructed using the urls and pa...
Details
- Task
- Text Generation, Fill Mask
- Language
- AF, AM, AR
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- statmt
- Year
- 2022