common-pile/comma_v0.1_training_dataset
General NLPEnglish
Common-pile/comma_v0.1_training_dataset is a General NLP dataset in English from common-pile in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 24.2K times.
About common-pile/comma_v0.1_training_dataset
Comma v0.1 dataset
This repository contains the dataset used to train Comma v0.1-1T and Comma v0.1-2T.
It is a slightly modified and consolidated version of the Common Pile v0.1 "filtered" data.
If you are looknig for the raw Common Pile v0.1 d...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- common-pile
- Year
- 2025
- Downloads
- 24212
- Likes
- 42