Skip to content

EleutherAI/pile

Text GenerationFill MaskEN

The EleutherAI/pile dataset is a EN text generation resource from EleutherAI at 2022. With 6K downloads and 500 likes, it is actively used by the community. It is released under the other license and is a 100B<n<1T-scale dataset.

About EleutherAI/pile

The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.

Details

Task
Text Generation, Fill Mask
Language
EN
Format
Parquet
Rows / instances
N/A
Size
100B<n<1T
Creator
EleutherAI
Year
2022
License
other
Downloads
5950
Likes
500
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ