Skip to content

CASIA-LM/ChineseWebText2.0

General NLPEnglishapache-2.0

CASIA-LM/ChineseWebText2.0 is a General NLP-focused dataset in English distributed in Parquet format. It is distributed under the apache-2.0 license and falls in the 1K<n<10K size category, and has been downloaded 2.9K times.

About CASIA-LM/ChineseWebText2.0

ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information This directory contains the ChineseWebText2.0 dataset, and a new tool-chain called MDFG-tool for constructing large-scale and high...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
1K<n<10K
Creator
CASIA-LM
Year
2024
License
apache-2.0
Downloads
2877
Likes
32
Download Homepage

Related General NLP datasets

FAQ