oscar-corpus/OSCAR-2109
General NLPAF, ALS, GSW
The oscar-corpus/OSCAR-2109 dataset is a AF, ALS, GSW General NLP resource from oscar-corpus at 2022.
About oscar-corpus/OSCAR-2109
The Open Super-large Crawled Aggregated coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.\
Details
- Task
- General NLP
- Language
- AF, ALS, GSW
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- oscar-corpus
- Year
- 2022