Skip to content

facebook/recycling_the_web

General NLPENcc-by-nc-4.0

Facebook/recycling_the_web is a General NLP-focused dataset in EN distributed in Parquet format. It is distributed under the cc-by-nc-4.0 license and falls in the 10M<n<100M size category, and has been downloaded 1.4K times.

About facebook/recycling_the_web

Dataset Card for Recycling-The-Web Synthetic Data We release 44.4B tokens of high-quality, model-filtered synthetic texts obtained via our REcycling the Web with guIded REwrite (REWIRE) approach. The generation process involves taking all docum...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
facebook
Year
2025
License
cc-by-nc-4.0
Downloads
1436
Likes
68
Download Homepage

Related General NLP datasets

FAQ