arxiv-community/arxiv_dataset
TranslationSummarizationText RetrievalENcc0-1.0
Arxiv-community/arxiv_dataset is a translation-focused dataset in EN that provides 2,349,354 labeled examples distributed in Parquet format. It is distributed under the cc0-1.0 license and falls in the 1M<n<10M size category, and has been downloaded 771 times.
About arxiv-community/arxiv_dataset
A dataset of 1.7 million arXiv articles for applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.
Details
- Task
- Translation, Summarization, Text Retrieval
- Language
- EN
- Format
- Parquet
- Rows / instances
- 2349354
- Size
- 1M<n<10M
- Creator
- arxiv-community
- Year
- 2022
- License
- cc0-1.0
- Downloads
- 771
- Likes
- 135