juliozhao/DocSynth300K
General NLPEnglish
Juliozhao/DocSynth300K is a General NLP dataset in English from juliozhao in Parquet format. And falls in the 100K<n<1M size category, and has been downloaded 565 times.
About juliozhao/DocSynth300K
DocSynth300K is a large-scale and diverse document layout analysis pre-training dataset, which can largely boost model performance.
Data Download
Use following command to download dataset(about 113G):
from huggingface_hub import snapsho...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100K<n<1M
- Creator
- juliozhao
- Year
- 2024
- Downloads
- 565
- Likes
- 55