jedibear/s2orc_full
Text GenerationFeature ExtractionText ClassificationENodc-by
Jedibear/s2orc_full is a text generation dataset in EN from jedibear with 14,515,649 records in Parquet format. It is distributed under the odc-by license and falls in the 10M<n<100M size category, and has been downloaded 23.7K times.
About jedibear/s2orc_full
S2ORC Full — Semantic Scholar Open Research Corpus
A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information.
...
Details
- Task
- Text Generation, Feature Extraction, Text Classification
- Language
- EN
- Format
- Parquet
- Rows / instances
- 14515649
- Size
- 10M<n<100M
- Creator
- jedibear
- Year
- 2026
- License
- odc-by
- Downloads
- 23721
- Likes
- 0