CASIA-LM/ChineseWebText2.0
General NLPEnglishapache-2.0
CASIA-LM/ChineseWebText2.0 is a General NLP-focused dataset in English distributed in Parquet format. It is distributed under the apache-2.0 license and falls in the 1K<n<10K size category, and has been downloaded 2.9K times.
About CASIA-LM/ChineseWebText2.0
ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information
This directory contains the ChineseWebText2.0 dataset, and a new tool-chain called MDFG-tool for constructing large-scale and high...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1K<n<10K
- Creator
- CASIA-LM
- Year
- 2024
- License
- apache-2.0
- Downloads
- 2877
- Likes
- 32