Skip to content

google/wit

Text RetrievalImage To TextAF, AR, ASTcc-by-sa-3.0

Google/wit is a text retrieval-focused dataset in AF, AR, AST distributed in Parquet format. It is distributed under the cc-by-sa-3.0 license and falls in the 1M<n<10M size category, and has been downloaded 269 times.

About google/wit

Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables ...

Details

Task
Text Retrieval, Image To Text
Language
AF, AR, AST
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
google
Year
2022
License
cc-by-sa-3.0
Downloads
269
Likes
69
Download Homepage

FAQ