thu-coai/lccc
General NLPZH
Thu-coai/lccc is a General NLP-focused dataset in ZH distributed in Parquet format.
About thu-coai/lccc
LCCC: Large-scale Cleaned Chinese Conversation corpus (LCCC) is a large corpus of Chinese conversations.
A rigorous data cleaning pipeline is designed to ensure the quality of the corpus.
This pipeline involves a set of rules and several classifie...
Details
- Task
- General NLP
- Language
- ZH
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- thu-coai
- Year
- 2022