Sinoosoida/SpeechRu
Automatic Speech RecognitionText To SpeechAudio ClassificationRU
Sinoosoida/SpeechRu is a automatic speech recognition-focused dataset in RU distributed in Parquet format.
About Sinoosoida/SpeechRu
Russian Podcasts (unlabeled)
~186k unlabeled Russian-language podcast episodes scraped from the web,
packaged as Parquet shards with the audio bytes embedded. The audio has no
transcripts — this is an unsupervised / self-supervised audio corpus...
Details
- Task
- Automatic Speech Recognition, Text To Speech, Audio Classification
- Language
- RU
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Sinoosoida
- Year
- 2026