orionweller/mmBERT-pretraining-data-chunk1
Fill MaskEnglish
Orionweller/mmBERT-pretraining-data-chunk1 is a fill mask dataset in English from orionweller in Parquet format.
About orionweller/mmBERT-pretraining-data-chunk1
mmBERT Training Data (Ready-to-Use)
Complete Training Dataset: Pre-randomized and ready-to-use multilingual training data (3T tokens) for encoder model pre-training.
This dataset is part of the complete, pre-shuffled training data used to...
Details
- Task
- Fill Mask
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- orionweller
- Year
- 2025