Skip to content

thu-coai/lccc

General NLPZH

Thu-coai/lccc is a General NLP-focused dataset in ZH distributed in Parquet format.

About thu-coai/lccc

LCCC: Large-scale Cleaned Chinese Conversation corpus (LCCC) is a large corpus of Chinese conversations. A rigorous data cleaning pipeline is designed to ensure the quality of the corpus. This pipeline involves a set of rules and several classifie...

Details

Task
General NLP
Language
ZH
Format
Parquet
Rows / instances
N/A
Creator
thu-coai
Year
2022
Download

Related General NLP datasets

FAQ