Skip to content

karpathy/tinystories-gpt4-clean

General NLPEnglish

Karpathy/tinystories-gpt4-clean is a General NLP dataset in English from karpathy in Parquet format.

About karpathy/tinystories-gpt4-clean

TinyStories GPT-4 Clean A cleaned subset of the TinyStories dataset (Eldan & Li, 2023), keeping only GPT-4-generated stories. Adapted from this thread that pointed out many issues with the original data and proposed a cleaning process. ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
karpathy
Year
2026
Download

Related General NLP datasets

FAQ