jhu-clsp/mmBERT-decay-data
Fill MaskEnglishmit
Jhu-clsp/mmBERT-decay-data is a fill mask dataset in English from jhu-clsp in Parquet format. It is distributed under the mit license, and has been downloaded 33.1K times.
About jhu-clsp/mmBERT-decay-data
MMBERT Decay Phase Data
Phase 3 of 3: Annealed language learning decay phase (100B tokens) with massive multilingual expansion to 1833 languages.
📊 Data Composition
NOTE: there are multiple decay data mixtures: this mixture de...
Details
- Task
- Fill Mask
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- jhu-clsp
- Year
- 2025
- License
- mit
- Downloads
- 33129
- Likes
- 6