Skip to content

webis/tldr-17

SummarizationEN

Webis/tldr-17 is a summarization-focused dataset in EN distributed in Parquet format.

About webis/tldr-17

This corpus contains preprocessed posts from the Reddit dataset. The dataset consists of 3,848,330 posts with an average length of 270 words for content, and 28 words for the summary. Features includes strings: author, body, normalizedBody, conte...

Details

Task
Summarization
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
webis
Year
2022
Download

Related Summarization datasets

FAQ