Skip to content

chess-pre-to-post/pretrain_v1_20b

General NLPENmit

Chess-pre-to-post/pretrain_v1_20b is a General NLP-focused dataset in EN distributed in Parquet format. It is distributed under the mit license and falls in the 10B<n<100B size category, and has been downloaded 14K times.

About chess-pre-to-post/pretrain_v1_20b

Chess Pre-to-Post — Pretraining Corpus v1 (20B) Raw tokenized pretraining data for the Chess Pre-to-Post project, stored as sharded NumPy arrays (shard_XXXX/raw.NNNN.npy). [!IMPORTANT] This is an earlier, smaller (20B) snapshot and is no longe...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
chess-pre-to-post
Year
2026
License
mit
Downloads
14022
Likes
0
Download Homepage

Related General NLP datasets

FAQ