Skip to content

SetFit/20_newsgroups

General NLPEnglish

SetFit/20_newsgroups is a General NLP-focused dataset in English distributed in Parquet format.

About SetFit/20_newsgroups

This is a version of the 20 newsgroups dataset that is provided in Scikit-learn. From the Scikit-learn docs: The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics split in two subsets: one for training (or development) an...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
SetFit
Year
2022
Download

Related General NLP datasets

FAQ