ccdv/pubmed-summarization
SummarizationText GenerationEN
Ccdv/pubmed-summarization is a summarization-focused dataset in EN that provides 266,430 labeled examples distributed in Parquet format. And falls in the 100K<n<1M size category, and has been downloaded 4.6K times.
About ccdv/pubmed-summarization
PubMed dataset for summarization
Dataset for summarization of long documents.Adapted from this repo.Note that original data are pre-tokenized so this dataset returns " ".join(text) and add "\n" for paragraphs. This dataset is compatible with th...
Details
- Task
- Summarization, Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- 266430
- Size
- 100K<n<1M
- Creator
- ccdv
- Year
- 2022
- Downloads
- 4639
- Likes
- 90