jobs-git/HPLT2.0_cleaned
Fill MaskText GenerationACE, AF, ALScc0-1.0
Created by jobs-git at 2026, the jobs-git/HPLT2.0_cleaned is a fill mask dataset in ACE, AF, ALS in Parquet format. With 157.5K downloads and 0 likes, it is actively used by the community. It is released under the cc0-1.0 license and is a n>1T-scale dataset.
About jobs-git/HPLT2.0_cleaned
This is a large-scale collection of web-crawled documents in 191 world languages, produced by the HPLT project.
The source of the data is mostly Internet Archive with some additions from Common Crawl.
For a detailed description of the dataset, pl...
Details
- Task
- Fill Mask, Text Generation
- Language
- ACE, AF, ALS
- Format
- Parquet
- Rows / instances
- N/A
- Size
- n>1T
- Creator
- jobs-git
- Year
- 2026
- License
- cc0-1.0
- Downloads
- 157460
- Likes
- 0