Skip to content

HCAI-Lab/dolma3-6t-corpus-manifest

General NLPEnglish

HCAI-Lab/dolma3-6t-corpus-manifest is a General NLP dataset in English from HCAI-Lab in Parquet format. It has been downloaded 15.2K times.

About HCAI-Lab/dolma3-6t-corpus-manifest

dolma3-6t-corpus-manifest Unified per-document manifest joining topic/format/quality/token-count/source-shard for the Dolma3 6T corpus. Cross-shard parquet partitioned dataset. Provenance This dataset was renamed on 2026-05-25 as...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
HCAI-Lab
Year
2026
Downloads
15172
Likes
0
Download Homepage

Related General NLP datasets

FAQ