Skip to content

jobs-git/HPLT2.0_cleaned

Fill MaskText GenerationACE, AF, ALScc0-1.0

Created by jobs-git at 2026, the jobs-git/HPLT2.0_cleaned is a fill mask dataset in ACE, AF, ALS in Parquet format. With 157.5K downloads and 0 likes, it is actively used by the community. It is released under the cc0-1.0 license and is a n>1T-scale dataset.

About jobs-git/HPLT2.0_cleaned

This is a large-scale collection of web-crawled documents in 191 world languages, produced by the HPLT project. The source of the data is mostly Internet Archive with some additions from Common Crawl. For a detailed description of the dataset, pl...

Details

Task
Fill Mask, Text Generation
Language
ACE, AF, ALS
Format
Parquet
Rows / instances
N/A
Size
n>1T
Creator
jobs-git
Year
2026
License
cc0-1.0
Downloads
157460
Likes
0
Download Homepage

Related Fill Mask, Text Generation datasets

FAQ