gmongaras/SlimPajama-627B_Reupload
General NLPEnglish
Gmongaras/SlimPajama-627B_Reupload is a General NLP-focused dataset in English that provides 591,399,449 labeled examples distributed in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 19K times.
About gmongaras/SlimPajama-627B_Reupload
As datasets puts limits on the number of calls to huggingface, downloading SlimPajama-627B is problematic as it's composed of a ton of small files.
I have reuploaded it here as larger chunks to easily download the dataset without having to do any...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 591399449
- Size
- 100M<n<1B
- Creator
- gmongaras
- Year
- 2025
- Downloads
- 18956
- Likes
- 10