HuggingFaceFW/finepdfs
Text GenerationAAI, AAK, AAUodc-by
Created by HuggingFaceFW at 2025, the HuggingFaceFW/finepdfs is a text generation dataset in AAI, AAK, AAU in Parquet format. With 80.3K downloads and 882 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.
About HuggingFaceFW/finepdfs
Liberating 3T of the finest tokens from PDFs
What is this?
As we run out of web pages to process, the natural question has always been: what to do next? Only a few knew about a data source that everyone avoided for ages, due to its inc...
Details
- Task
- Text Generation
- Language
- AAI, AAK, AAU
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- HuggingFaceFW
- Year
- 2025
- License
- odc-by
- Downloads
- 80256
- Likes
- 882