orionweller/mmBERT-pretraining-data-chunk2
Fill MaskEnglish
The orionweller/mmBERT-pretraining-data-chunk2 dataset is a English fill mask resource from orionweller at 2025.
About orionweller/mmBERT-pretraining-data-chunk2
mmBERT Training Data (Ready-to-Use)
Complete Training Dataset: Pre-randomized and ready-to-use multilingual training data (3T tokens) for encoder model pre-training.
This dataset is part of the complete, pre-shuffled training data used to...
Details
- Task
- Fill Mask
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- orionweller
- Year
- 2025