Skip to content

EleutherAI/the_pile_deduplicated

General NLPEnglish

EleutherAI/the_pile_deduplicated is a General NLP dataset in English from EleutherAI in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 16.4K times.

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
EleutherAI
Year
2022
Downloads
16356
Likes
114
Download Homepage

Related General NLP datasets

FAQ