Skip to content

CASIA-LM/ChineseWebText

General NLPEnglish

CASIA-LM/ChineseWebText is a General NLP-focused dataset in English distributed in Parquet format. And falls in the 1K<n<10K size category, and has been downloaded 1.3K times.

About CASIA-LM/ChineseWebText

ChineseWebText: Large-Scale High-quality Chinese Web Text Extracted with Effective Evaluation Model This directory contains the ChineseWebText dataset, and the EvalWeb tool-chain to process CommonCrawl Data. Our EvalWeb tool is publicly availab...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
1K<n<10K
Creator
CASIA-LM
Year
2023
Downloads
1311
Likes
44
Download Homepage

Related General NLP datasets

FAQ