BAAI/CCI2-Data
Text GenerationZH
BAAI/CCI2-Data is a text generation dataset in ZH from BAAI with 178,959,936 records in Parquet format. And falls in the 100M<n<1B size category, and has been downloaded 497 times.
About BAAI/CCI2-Data
Data Description
To address the scarcity of high-quality safety datasets in the Chinese, we open-sourced the CCI (Chinese Corpora Internet) dataset on November 29, 2023. Building on this foundation, we continue to expand the data source, adopt ...
Details
- Task
- Text Generation
- Language
- ZH
- Format
- Parquet
- Rows / instances
- 178959936
- Size
- 100M<n<1B
- Creator
- BAAI
- Year
- 2024
- Downloads
- 497
- Likes
- 57