Jr23xd23/ArabicText-Large
Text GenerationFill MaskText ClassificationARapache-2.0
Jr23xd23/ArabicText-Large is a text generation-focused dataset in AR distributed in Parquet format. It is distributed under the apache-2.0 license and falls in the 100K<n<1M size category, and has been downloaded 413 times.
About Jr23xd23/ArabicText-Large
ArabicText-Large: High-Quality Arabic Corpus for LLM Training
Dataset Summary
ArabicText-Large is a comprehensive, high-quality Arabic text corpus comprising 743,288 articles with over 244 million words, specifically curated fo...
Details
- Task
- Text Generation, Fill Mask, Text Classification
- Language
- AR
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100K<n<1M
- Creator
- Jr23xd23
- Year
- 2025
- License
- apache-2.0
- Downloads
- 413
- Likes
- 69