Skip to content

common-pile/comma_v0.1_training_dataset

General NLPEnglish

Common-pile/comma_v0.1_training_dataset is a General NLP dataset in English from common-pile in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 24.2K times.

About common-pile/comma_v0.1_training_dataset

Comma v0.1 dataset This repository contains the dataset used to train Comma v0.1-1T and Comma v0.1-2T. It is a slightly modified and consolidated version of the Common Pile v0.1 "filtered" data. If you are looknig for the raw Common Pile v0.1 d...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
common-pile
Year
2025
Downloads
24212
Likes
42
Download Homepage

Related General NLP datasets

FAQ