Skip to content

DKYoon/SlimPajama-6B

Text GenerationEN

Created by DKYoon at 2023, the DKYoon/SlimPajama-6B is a text generation dataset in EN containing 5,507,693 records in Parquet format. With 16.8K downloads and 63 likes, it is actively used by the community and is a 1M<n<10M-scale dataset.

About DKYoon/SlimPajama-6B

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, I only downloaded train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B. The dataset is 2...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
5507693
Size
1M<n<10M
Creator
DKYoon
Year
2023
Downloads
16840
Likes
63
Download Homepage

Related Text Generation datasets

FAQ