Skip to content

oscar-corpus/OSCAR-2109

General NLPAF, ALS, GSW

The oscar-corpus/OSCAR-2109 dataset is a AF, ALS, GSW General NLP resource from oscar-corpus at 2022.

About oscar-corpus/OSCAR-2109

The Open Super-large Crawled Aggregated coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.\

Details

Task
General NLP
Language
AF, ALS, GSW
Format
Parquet
Rows / instances
N/A
Creator
oscar-corpus
Year
2022
Download

Related General NLP datasets

FAQ