disco-eth/WorldSpeech
Automatic Speech RecognitionText To SpeechAudio ClassificationAF, AM, ARcc-by-nc-4.0
Created by disco-eth at 2026, the disco-eth/WorldSpeech is a automatic speech recognition dataset in AF, AM, AR in Parquet format. With 27.3K downloads and 20 likes, it is actively used by the community. It is released under the cc-by-nc-4.0 license and is a 10M<n<100M-scale dataset.
About disco-eth/WorldSpeech
WorldSpeech
A multilingual ASR dataset containing over 65k hours of human transcribed speech across 127 language-region variants, drawn from national parliaments, public broadcasters, public-domain audiobooks, and international institutions. Ro...
Details
- Task
- Automatic Speech Recognition, Text To Speech, Audio Classification
- Language
- AF, AM, AR
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- disco-eth
- Year
- 2026
- License
- cc-by-nc-4.0
- Downloads
- 27317
- Likes
- 20