Question 1

What is the Sinoosoida/SpeechRu dataset?

Accepted Answer

Russian Podcasts (unlabeled)

~186k unlabeled Russian-language podcast episodes scraped from the web,
packaged as Parquet shards with the audio bytes embedded. The audio has no
transcripts — this is an unsupervised / self-supervised audio corpus...

Question 2

Is Sinoosoida/SpeechRu a benchmark?

Accepted Answer

Sinoosoida/SpeechRu is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download Sinoosoida/SpeechRu?

Accepted Answer

Sinoosoida/SpeechRu is available at its source: https://huggingface.co/datasets/Sinoosoida/SpeechRu.

Sinoosoida/SpeechRu

About Sinoosoida/SpeechRu

Details

Related Automatic Speech Recognition, Text To Speech, Audio Classification datasets

FAQ