Skip to content

arxiv-community/arxiv_dataset

TranslationSummarizationText RetrievalENcc0-1.0

Arxiv-community/arxiv_dataset is a translation-focused dataset in EN that provides 2,349,354 labeled examples distributed in Parquet format. It is distributed under the cc0-1.0 license and falls in the 1M<n<10M size category, and has been downloaded 771 times.

About arxiv-community/arxiv_dataset

A dataset of 1.7 million arXiv articles for applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.

Details

Task
Translation, Summarization, Text Retrieval
Language
EN
Format
Parquet
Rows / instances
2349354
Size
1M<n<10M
Creator
arxiv-community
Year
2022
License
cc0-1.0
Downloads
771
Likes
135
Download Homepage

FAQ