Skip to content

ARTPARK-IISc/Vaani

Automatic Speech RecognitionText To SpeechImage To TextText To ImageNE, AS, MLcc-by-4.0

ARTPARK-IISc/Vaani is a automatic speech recognition-focused dataset in NE, AS, ML that provides 22,614,564 labeled examples distributed in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 10M<n<100M size category, and has been downloaded 56.4K times.

About ARTPARK-IISc/Vaani

VAANI is an India-representative multi-modal multi-lingual dataset. The current version (phase 1- 80 districts, phase 2- 85 districts) contains ~31,255 hours of spontaenous,image-prompted speech by 156K speakers across 165 districts, talking abou...

Details

Task
Automatic Speech Recognition, Text To Speech, Image To Text, Text To Image
Language
NE, AS, ML
Format
Parquet
Rows / instances
22614564
Size
10M<n<100M
Creator
ARTPARK-IISc
Year
2024
License
cc-by-4.0
Downloads
56362
Likes
125
Download Homepage

FAQ