amazon-agi/SIFT-50M
Audio Text To TextAudio ClassificationText To SpeechAudio To AudioEN, DE, FR
Amazon-agi/SIFT-50M is a audio text to text dataset in EN, DE, FR from amazon-agi in Parquet format. It is distributed under the cdla-sharing-1.0 license and falls in the 10M<n<100M size category, and has been downloaded 7.6K times.
About amazon-agi/SIFT-50M
Dataset Card for SIFT-50M
SIFT-50M (Speech Instruction Fine-Tuning) is a 50-million-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). It is built from publicly available speech co...
Details
- Task
- Audio Text To Text, Audio Classification, Text To Speech, Audio To Audio
- Language
- EN, DE, FR
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- amazon-agi
- Year
- 2025
- License
- cdla-sharing-1.0
- Downloads
- 7609
- Likes
- 38