Skip to content

HiTZ/Multilingual-Medical-Corpus

General NLPEN, ES, FRapache-2.0

HiTZ/Multilingual-Medical-Corpus is a General NLP dataset in EN, ES, FR from HiTZ with 67,367,857 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 10M<n<100M size category, and has been downloaded 469 times.

About HiTZ/Multilingual-Medical-Corpus

Mutilingual Medical Corpus Multilingual-Medical-Corpus a 3 billion word multilingual corpus for training LLMs adapted to the medical domain. Multilingual-Medical-Corpus includes four languages, namely, English, Spanish, French, and Italian....

Details

Task
General NLP
Language
EN, ES, FR
Format
Parquet
Rows / instances
67367857
Size
10M<n<100M
Creator
HiTZ
Year
2024
License
apache-2.0
Downloads
469
Likes
45
Download Homepage

Related General NLP datasets

FAQ