Skip to content

ctheodoris/Genecorpus-30M

General NLPEnglishapache-2.0

Created by ctheodoris at 2022, the ctheodoris/Genecorpus-30M is a General NLP dataset in English in Parquet format. With 509 downloads and 85 likes, it is actively used by the community. It is released under the apache-2.0 license.

About ctheodoris/Genecorpus-30M

Dataset Card for Genecorpus-30M Dataset Description Point of Contact: christina.theodoris@gladstone.ucsf.edu Dataset Summary We assembled a large-scale pretraining corpus, Genecorpus-30M, comprised of ~30 million human ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
ctheodoris
Year
2022
License
apache-2.0
Downloads
509
Likes
85
Download Homepage

Related General NLP datasets

FAQ