ctheodoris/Genecorpus-30M
General NLPEnglishapache-2.0
Created by ctheodoris at 2022, the ctheodoris/Genecorpus-30M is a General NLP dataset in English in Parquet format. With 509 downloads and 85 likes, it is actively used by the community. It is released under the apache-2.0 license.
About ctheodoris/Genecorpus-30M
Dataset Card for Genecorpus-30M
Dataset Description
Point of Contact: christina.theodoris@gladstone.ucsf.edu
Dataset Summary
We assembled a large-scale pretraining corpus, Genecorpus-30M, comprised of ~30 million human ...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- ctheodoris
- Year
- 2022
- License
- apache-2.0
- Downloads
- 509
- Likes
- 85