DKYoon/SlimPajama-6B
Text GenerationEN
Created by DKYoon at 2023, the DKYoon/SlimPajama-6B is a text generation dataset in EN containing 5,507,693 records in Parquet format. With 16.8K downloads and 63 likes, it is actively used by the community and is a 1M<n<10M-scale dataset.
About DKYoon/SlimPajama-6B
Sampled version of cerebras/SlimPajama-627B.
Since the original data was shuffled before chunking, I only downloaded train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.
The dataset is 2...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- 5507693
- Size
- 1M<n<10M
- Creator
- DKYoon
- Year
- 2023
- Downloads
- 16840
- Likes
- 63