commoncrawl/gneissweb-annotation-url-testing-v1
General NLPEnglish
The commoncrawl/gneissweb-annotation-url-testing-v1 dataset is a English General NLP resource from commoncrawl at 2025. With 18.9K downloads and 0 likes, it is actively used by the community and is a 10B<n<100B-scale dataset.
About commoncrawl/gneissweb-annotation-url-testing-v1
GneissWeb Annotations
GneissWeb Annotations, powered by IBM Research's GneissWeb methodology, is a dataset of quality and category annotations applied to the Common Crawl corpus.
This dataset enables precise filtering of web content across medi...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- commoncrawl
- Year
- 2025
- Downloads
- 18936
- Likes
- 0