hotchpotch/fineweb-2-edu-japanese
General NLPJAodc-by
Created by hotchpotch at 2025, the hotchpotch/fineweb-2-edu-japanese is a General NLP dataset in JA containing 262,282,564 records in Parquet format. With 11.2K downloads and 32 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.
About hotchpotch/fineweb-2-edu-japanese
🍷 FineWeb2 Edu Japanese: High-Quality Educational Japanese Dataset
This dataset consists of 120 million texts (approximately 89.3B tokens) filtered from the 376 million Japanese texts in FineWeb2 that were deemed educational. The following sub...
Details
- Task
- General NLP
- Language
- JA
- Format
- Parquet
- Rows / instances
- 262282564
- Size
- 100M<n<1B
- Creator
- hotchpotch
- Year
- 2025
- License
- odc-by
- Downloads
- 11173
- Likes
- 32