Henrychur/MMedC
General NLPEN, ZH, JAcc-by-nc-sa-4.0
The Henrychur/MMedC dataset is a EN, ZH, JA General NLP resource from Henrychur at 2024. With 158 downloads and 37 likes, it is actively used by the community. It is released under the cc-by-nc-sa-4.0 license and is a 10B<n<100B-scale dataset.
About Henrychur/MMedC
MMedC
💻Github Repo 🖨️arXiv Paper
The official pre-training dataset for "Towards Building Multilingual Language Model for Medicine".
News
We add Arabic and German corpus to MMedC.
Introduction
This repo contains MMedC, ...
Details
- Task
- General NLP
- Language
- EN, ZH, JA
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- Henrychur
- Year
- 2024
- License
- cc-by-nc-sa-4.0
- Downloads
- 158
- Likes
- 37