Skip to content

juliozhao/DocSynth300K

General NLPEnglish

Juliozhao/DocSynth300K is a General NLP dataset in English from juliozhao in Parquet format. And falls in the 100K<n<1M size category, and has been downloaded 565 times.

About juliozhao/DocSynth300K

DocSynth300K is a large-scale and diverse document layout analysis pre-training dataset, which can largely boost model performance. Data Download Use following command to download dataset(about 113G): from huggingface_hub import snapsho...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100K<n<1M
Creator
juliozhao
Year
2024
Downloads
565
Likes
55
Download Homepage

Related General NLP datasets

FAQ