Skip to content

alvanlii/cantonese-youtube

Automatic Speech RecognitionAudio ClassificationZH, YUE

The alvanlii/cantonese-youtube dataset is a ZH, YUE automatic speech recognition resource from alvanlii at 2024 comprising 1,478,373 examples. With 551 downloads and 51 likes, it is actively used by the community and is a 1M<n<10M-scale dataset.

About alvanlii/cantonese-youtube

Cantonese Youtube Pseudo-Transcription Dataset Contains approximately 10k hours of audio sourced from YouTube Videos are chosen at random, and scraped on a channel basis Includes news, vlogs, entertainment, stories, health Columns transcript...

Details

Task
Automatic Speech Recognition, Audio Classification
Language
ZH, YUE
Format
Parquet
Rows / instances
1478373
Size
1M<n<10M
Creator
alvanlii
Year
2024
Downloads
551
Likes
51
Download Homepage

Related Automatic Speech Recognition, Audio Classification datasets

FAQ