CASIA-LM/ChineseWebText
General NLPEnglish
CASIA-LM/ChineseWebText is a General NLP-focused dataset in English distributed in Parquet format. And falls in the 1K<n<10K size category, and has been downloaded 1.3K times.
About CASIA-LM/ChineseWebText
ChineseWebText: Large-Scale High-quality Chinese Web Text Extracted with Effective Evaluation Model
This directory contains the ChineseWebText dataset, and the EvalWeb tool-chain to process CommonCrawl Data. Our EvalWeb tool is publicly availab...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1K<n<10K
- Creator
- CASIA-LM
- Year
- 2023
- Downloads
- 1311
- Likes
- 44