HiTZ/Multilingual-Medical-Corpus
General NLPEN, ES, FRapache-2.0
HiTZ/Multilingual-Medical-Corpus is a General NLP dataset in EN, ES, FR from HiTZ with 67,367,857 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 10M<n<100M size category, and has been downloaded 469 times.
About HiTZ/Multilingual-Medical-Corpus
Mutilingual Medical Corpus
Multilingual-Medical-Corpus a 3 billion word multilingual corpus for training LLMs adapted to the medical domain. Multilingual-Medical-Corpus includes four languages, namely, English, Spanish, French, and Italian....
Details
- Task
- General NLP
- Language
- EN, ES, FR
- Format
- Parquet
- Rows / instances
- 67367857
- Size
- 10M<n<100M
- Creator
- HiTZ
- Year
- 2024
- License
- apache-2.0
- Downloads
- 469
- Likes
- 45