Skip to content

disco-eth/WorldSpeech

Automatic Speech RecognitionText To SpeechAudio ClassificationAF, AM, ARcc-by-nc-4.0

Created by disco-eth at 2026, the disco-eth/WorldSpeech is a automatic speech recognition dataset in AF, AM, AR in Parquet format. With 27.3K downloads and 20 likes, it is actively used by the community. It is released under the cc-by-nc-4.0 license and is a 10M<n<100M-scale dataset.

About disco-eth/WorldSpeech

WorldSpeech A multilingual ASR dataset containing over 65k hours of human transcribed speech across 127 language-region variants, drawn from national parliaments, public broadcasters, public-domain audiobooks, and international institutions. Ro...

Details

Task
Automatic Speech Recognition, Text To Speech, Audio Classification
Language
AF, AM, AR
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
disco-eth
Year
2026
License
cc-by-nc-4.0
Downloads
27317
Likes
20
Download Homepage

Related Automatic Speech Recognition, Text To Speech, Audio Classification datasets

FAQ