pixparse/idl-wds
Image To TextEnglish
Created by pixparse at 2023, the pixparse/idl-wds is a image to text dataset in English in Parquet format.
About pixparse/idl-wds
Dataset Card for Industry Documents Library (IDL)
Dataset Summary
Industry Documents Library (IDL) is a document dataset filtered from UCSF documents library with 19 million pages kept as valid samples.
Each document exists as a coll...
Details
- Task
- Image To Text
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- pixparse
- Year
- 2023