ARTPARK-IISc/Vaani
Automatic Speech RecognitionText To SpeechImage To TextText To ImageNE, AS, MLcc-by-4.0
ARTPARK-IISc/Vaani is a automatic speech recognition-focused dataset in NE, AS, ML that provides 22,614,564 labeled examples distributed in Parquet format. It is distributed under the cc-by-4.0 license and falls in the 10M<n<100M size category, and has been downloaded 56.4K times.
About ARTPARK-IISc/Vaani
VAANI is an India-representative multi-modal multi-lingual dataset.
The current version (phase 1- 80 districts, phase 2- 85 districts) contains ~31,255 hours of spontaenous,image-prompted speech by 156K speakers across 165 districts, talking abou...
Details
- Task
- Automatic Speech Recognition, Text To Speech, Image To Text, Text To Image
- Language
- NE, AS, ML
- Format
- Parquet
- Rows / instances
- 22614564
- Size
- 10M<n<100M
- Creator
- ARTPARK-IISc
- Year
- 2024
- License
- cc-by-4.0
- Downloads
- 56362
- Likes
- 125