Skip to content

pavelslab-nyu/pretrain_v1_54B

General NLPEN

The pavelslab-nyu/pretrain_v1_54B dataset is a EN General NLP resource from pavelslab-nyu at 2026.

About pavelslab-nyu/pretrain_v1_54B

Chess Pre-to-Post — Pretraining Corpus v1 (54B) Raw tokenized pretraining data for the Chess Pre-to-Post project, stored as sharded NumPy arrays (shard_XXXX/raw.NNNN.npy). [!IMPORTANT] This is the maintained version of the corpus. It supersede...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
pavelslab-nyu
Year
2026
Download

Related General NLP datasets

FAQ