Skip to content

HAERAE-HUB/KOREAN-WEBTEXT

General NLPKO

Created by HAERAE-HUB at 2024, the HAERAE-HUB/KOREAN-WEBTEXT is a General NLP dataset in KO containing 1,284,879 records in Parquet format. With 476 downloads and 47 likes, it is actively used by the community and is a 1M<n<10M-scale dataset.

About HAERAE-HUB/KOREAN-WEBTEXT

KOREAN-WEBTEXT KOREAN-WEBTEXT is a high-quality Korean language corpus consisting of 2.2 billion tokens. The data has been collected from the following sources: cc100 oscar-corpus/OSCAR-2201 oscar-corpus/OSCAR-2109 oscar-corpus/OSCAR-2301 onto...

Details

Task
General NLP
Language
KO
Format
Parquet
Rows / instances
1284879
Size
1M<n<10M
Creator
HAERAE-HUB
Year
2024
Downloads
476
Likes
47
Download Homepage

Related General NLP datasets

FAQ