Skip to content

ming030890/youtube_caption_yue

General NLPEnglish

Ming030890/youtube_caption_yue is a General NLP dataset in English from ming030890 in Parquet format.

About ming030890/youtube_caption_yue

YouTube ASR Caption Dataset (Cantonese) This dataset was built from YouTube videos with manually provided captions in Cantonese. We used SenseVoice to re-transcribe the audio and filtered segments to build a high-quality collection of audio-cap...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
ming030890
Year
2025
Download

Related General NLP datasets

FAQ