approximatelabs/tablib-v1-full
General NLPEnglish
Created by approximatelabs at 2023, the approximatelabs/tablib-v1-full is a General NLP dataset in English in Parquet format. With 23.6K downloads and 67 likes, it is actively used by the community. It is released under the other license and is a 10B<n<100B-scale dataset.
About approximatelabs/tablib-v1-full
TabLib
A minimally-preprocessed dataset of 627M tables (69 TiB) extracted from HTML, PDF, CSV, TSV, Excel, and SQLite files from GitHub and Common Crawl.
This includes 867B tokens of "context metadata": each table includes provenance informatio...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- approximatelabs
- Year
- 2023
- License
- other
- Downloads
- 23555
- Likes
- 67