pavelslab-nyu/pretrain_v1_54B
General NLPEN
The pavelslab-nyu/pretrain_v1_54B dataset is a EN General NLP resource from pavelslab-nyu at 2026.
About pavelslab-nyu/pretrain_v1_54B
Chess Pre-to-Post — Pretraining Corpus v1 (54B)
Raw tokenized pretraining data for the Chess Pre-to-Post project, stored as
sharded NumPy arrays (shard_XXXX/raw.NNNN.npy).
[!IMPORTANT]
This is the maintained version of the corpus. It supersede...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- pavelslab-nyu
- Year
- 2026