Skip to content

BAAI/CCI2-Data

Text GenerationZH

BAAI/CCI2-Data is a text generation dataset in ZH from BAAI with 178,959,936 records in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 497 times.

About BAAI/CCI2-Data

Data Description To address the scarcity of high-quality safety datasets in the Chinese, we open-sourced the CCI (Chinese Corpora Internet) dataset on November 29, 2023. Building on this foundation, we continue to expand the data source, adopt ...

Details

Task
Text Generation
Language
ZH
Format
Parquet
Rows / instances
178959936
Size
100M<n<1B
Creator
BAAI
Year
2024
Downloads
497
Likes
57
Download Homepage

Related Text Generation datasets

FAQ