google/wiki40b
General NLPEN
Google/wiki40b is a General NLP-focused dataset in EN that provides 18,126,525 labeled examples distributed in Parquet format. And falls in the 10M<n<100M size category, and has been downloaded 10.7K times.
About google/wiki40b
Dataset Card for "wiki40b"
Dataset Summary
Clean-up text for 40+ Wikipedia languages editions of pages
correspond to entities. The datasets have train/dev/test splits per language.
The dataset is cleaned up by page filtering to remov...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- 18126525
- Size
- 10M<n<100M
- Creator
- Year
- 2022
- Downloads
- 10715
- Likes
- 36