Skip to content

commoncrawl/gneissweb-annotation-url-testing-v1

General NLPEnglish

The commoncrawl/gneissweb-annotation-url-testing-v1 dataset is a English General NLP resource from commoncrawl at 2025. With 18.9K downloads and 0 likes, it is actively used by the community and is a 10B<n<100B-scale dataset.

About commoncrawl/gneissweb-annotation-url-testing-v1

GneissWeb Annotations GneissWeb Annotations, powered by IBM Research's GneissWeb methodology, is a dataset of quality and category annotations applied to the Common Crawl corpus. This dataset enables precise filtering of web content across medi...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
commoncrawl
Year
2025
Downloads
18936
Likes
0
Download Homepage

Related General NLP datasets

FAQ