Skip to content

gmongaras/SlimPajama-627B_Reupload

General NLPEnglish

Gmongaras/SlimPajama-627B_Reupload is a General NLP-focused dataset in English that provides 591,399,449 labeled examples distributed in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 19K times.

About gmongaras/SlimPajama-627B_Reupload

As datasets puts limits on the number of calls to huggingface, downloading SlimPajama-627B is problematic as it's composed of a ton of small files. I have reuploaded it here as larger chunks to easily download the dataset without having to do any...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
591399449
Size
100M<n<1B
Creator
gmongaras
Year
2025
Downloads
18956
Likes
10
Download Homepage

Related General NLP datasets

FAQ