Skip to content

HuggingFaceFW/finepdfs

Text GenerationAAI, AAK, AAUodc-by

Created by HuggingFaceFW at 2025, the HuggingFaceFW/finepdfs is a text generation dataset in AAI, AAK, AAU in Parquet format. With 80.3K downloads and 882 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.

About HuggingFaceFW/finepdfs

Liberating 3T of the finest tokens from PDFs What is this? As we run out of web pages to process, the natural question has always been: what to do next? Only a few knew about a data source that everyone avoided for ages, due to its inc...

Details

Task
Text Generation
Language
AAI, AAK, AAU
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
HuggingFaceFW
Year
2025
License
odc-by
Downloads
80256
Likes
882
Download Homepage

Related Text Generation datasets

FAQ