Skip to content

approximatelabs/tablib-v1-full

General NLPEnglish

Created by approximatelabs at 2023, the approximatelabs/tablib-v1-full is a General NLP dataset in English in Parquet format. With 23.6K downloads and 67 likes, it is actively used by the community. It is released under the other license and is a 10B<n<100B-scale dataset.

About approximatelabs/tablib-v1-full

TabLib A minimally-preprocessed dataset of 627M tables (69 TiB) extracted from HTML, PDF, CSV, TSV, Excel, and SQLite files from GitHub and Common Crawl. This includes 867B tokens of "context metadata": each table includes provenance informatio...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
approximatelabs
Year
2023
License
other
Downloads
23555
Likes
67
Download Homepage

Related General NLP datasets

FAQ