Skip to content

orionweller/mmBERT-pretraining-data-chunk1

Fill MaskEnglish

Orionweller/mmBERT-pretraining-data-chunk1 is a fill mask dataset in English from orionweller in Parquet format.

About orionweller/mmBERT-pretraining-data-chunk1

mmBERT Training Data (Ready-to-Use) Complete Training Dataset: Pre-randomized and ready-to-use multilingual training data (3T tokens) for encoder model pre-training. This dataset is part of the complete, pre-shuffled training data used to...

Details

Task
Fill Mask
Language
English
Format
Parquet
Rows / instances
N/A
Creator
orionweller
Year
2025
Download

Related Fill Mask datasets

FAQ