karpathy/tinystories-gpt4-clean
General NLPEnglish
Karpathy/tinystories-gpt4-clean is a General NLP dataset in English from karpathy in Parquet format.
About karpathy/tinystories-gpt4-clean
TinyStories GPT-4 Clean
A cleaned subset of the TinyStories dataset (Eldan & Li, 2023), keeping only GPT-4-generated stories. Adapted from this thread that pointed out many issues with the original data and proposed a cleaning process.
...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- karpathy
- Year
- 2026