Skip to content

ai4bharat/IndicVoices

General NLPEnglishcc-by-4.0

Created by ai4bharat at 2025, the ai4bharat/IndicVoices is a General NLP dataset in English containing 6,379,012 records in Parquet format. With 11.3K downloads and 70 likes, it is actively used by the community. It is released under the cc-by-4.0 license and is a 1M<n<10M-scale dataset.

About ai4bharat/IndicVoices

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages Updates [23 December 2025] We now have 11,200 hours of transcribed data! 🎉 Overview INDICVOICES is a dataset...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
6379012
Size
1M<n<10M
Creator
ai4bharat
Year
2025
License
cc-by-4.0
Downloads
11316
Likes
70
Download Homepage

Related General NLP datasets

FAQ