orionweller/mmBERT-pretraining-data-chunk0
Fill MaskEnglish
Created by orionweller at 2025, the orionweller/mmBERT-pretraining-data-chunk0 is a fill mask dataset in English in Parquet format.
About orionweller/mmBERT-pretraining-data-chunk0
mmBERT Training Data (Ready-to-Use)
Complete Training Dataset: Pre-randomized and ready-to-use multilingual training data (3T tokens) for encoder model pre-training.
This dataset is part of the complete, pre-shuffled training data used to...
Details
- Task
- Fill Mask
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- orionweller
- Year
- 2025