AlgorithmicResearchGroup/s2orc_arxiv
Text GenerationSummarizationFeature ExtractionENBenchmark
AlgorithmicResearchGroup/s2orc_arxiv is a text generation-focused benchmark dataset in EN distributed in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 20.5K times.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About AlgorithmicResearchGroup/s2orc_arxiv
S2ORC ArXiv
A subset of the Semantic Scholar Open Research Corpus (S2ORC) filtered to ArXiv papers. Contains 2.58 million parsed scientific papers with full text, abstracts, structured sections, figures, and citation metadata.
D...
Details
- Task
- Text Generation, Summarization, Feature Extraction
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- AlgorithmicResearchGroup
- Year
- 2026
- Downloads
- 20501
- Likes
- 2