Skip to content

hotchpotch/fineweb-2-edu-japanese

General NLPJAodc-by

Created by hotchpotch at 2025, the hotchpotch/fineweb-2-edu-japanese is a General NLP dataset in JA containing 262,282,564 records in Parquet format. With 11.2K downloads and 32 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.

About hotchpotch/fineweb-2-edu-japanese

🍷 FineWeb2 Edu Japanese: High-Quality Educational Japanese Dataset This dataset consists of 120 million texts (approximately 89.3B tokens) filtered from the 376 million Japanese texts in FineWeb2 that were deemed educational. The following sub...

Details

Task
General NLP
Language
JA
Format
Parquet
Rows / instances
262282564
Size
100M<n<1B
Creator
hotchpotch
Year
2025
License
odc-by
Downloads
11173
Likes
32
Download Homepage

Related General NLP datasets

FAQ