chess-pre-to-post/pretrain_v1_20b
General NLPENmit
Chess-pre-to-post/pretrain_v1_20b is a General NLP-focused dataset in EN distributed in Parquet format. It is distributed under the mit license and falls in the 10B<n<100B size category, and has been downloaded 14K times.
About chess-pre-to-post/pretrain_v1_20b
Chess Pre-to-Post — Pretraining Corpus v1 (20B)
Raw tokenized pretraining data for the Chess Pre-to-Post project, stored as
sharded NumPy arrays (shard_XXXX/raw.NNNN.npy).
[!IMPORTANT]
This is an earlier, smaller (20B) snapshot and is no longe...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- chess-pre-to-post
- Year
- 2026
- License
- mit
- Downloads
- 14022
- Likes
- 0