Skip to content

google/wiki40b

General NLPEN

Google/wiki40b is a General NLP-focused dataset in EN that provides 18,126,525 labeled examples distributed in Parquet format. And falls in the 10M<n<100M size category, and has been downloaded 10.7K times.

About google/wiki40b

Dataset Card for "wiki40b" Dataset Summary Clean-up text for 40+ Wikipedia languages editions of pages correspond to entities. The datasets have train/dev/test splits per language. The dataset is cleaned up by page filtering to remov...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
18126525
Size
10M<n<100M
Creator
google
Year
2022
Downloads
10715
Likes
36
Download Homepage

Related General NLP datasets

FAQ