facebook/recycling_the_web
General NLPENcc-by-nc-4.0
Facebook/recycling_the_web is a General NLP-focused dataset in EN distributed in Parquet format. It is distributed under the cc-by-nc-4.0 license and falls in the 10M<n<100M size category, and has been downloaded 1.4K times.
About facebook/recycling_the_web
Dataset Card for Recycling-The-Web Synthetic Data
We release 44.4B tokens of high-quality, model-filtered synthetic texts obtained via our REcycling the Web with guIded REwrite (REWIRE) approach.
The generation process involves taking all docum...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- Year
- 2025
- License
- cc-by-nc-4.0
- Downloads
- 1436
- Likes
- 68