Speech recognition (ASR) Models
There are 25 AI and NLP models for Speech recognition (ASR) in our directory. Browse the full list below, or explore models by provider.
Speech recognition (ASR) is a machine-learning task covered in our directory. We list 25 models for it.
Updated June 2026
- RNN-WERSpeech recognition (ASR)DeepMind,University of Toronto
- MLN-ASRSpeech recognition (ASR)McGill University
- RNN 1000/5 + RT09 LM (NIST RT05)Speech recognition (ASR),TranscriptionBrno University of Technology,Johns Hopkins University
- Gemini Robotics-ER 1.5Instruction interpretation,Robotic manipulation,Image captioning,Object detection,Search,Language modeling/generation,Question answering,Speech recognition (ASR)Google DeepMind
- Gemini 2.5 Pro (Jun 2025)Language modeling/generation,Question answering,Code generation,Quantitative reasoning,Visual question answering,Translation,Image captioning,Video description,Speech recognition (ASR)Google DeepMind
- Gemini 2.5 Pro (May 2025)Language modeling/generation,Question answering,Code generation,Quantitative reasoning,Visual question answering,Translation,Image captioning,Video description,Speech recognition (ASR)Google DeepMind
- GPT-4o (Mar 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- Gemini 2.5 Pro (Mar 2025)Language modeling/generation,Question answering,Code generation,Quantitative reasoning,Visual question answering,Translation,Image captioning,Video description,Speech recognition (ASR)Google DeepMind
- ERNIE-4.5-VL-424B-A47B (文心大模型4.5)Language modeling/generation,Visual question answering,Video description,Speech recognition (ASR),Quantitative reasoning,Code generation,Translation,Question answering,Character recognition (OCR)Baidu
- GPT-4o (Jan 2025)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- Gemini 2.0 ProCode generation,Language modeling/generation,Question answering,Visual question answering,Speech recognition (ASR),Video descriptionGoogle DeepMind
- GPT-4o (Nov 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4o (Aug 2024)Chat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- GPT-4oChat,Image generation,Audio generation,Vision-language generation,Table tasks,Language modeling/generation,Question answering,Speech recognition (ASR),Speech-to-textOpenAI
- SauTechSpeech recognition (ASR),Speech-to-textSaudi Data and Artificial Intelligence Authority,Saudi Company for Artificial Intelligence
- Qwen3-Omni-30B-A3BLanguage modeling/generation,Question answering,Visual question answering,Image captioning,Video description,Speech recognition (ASR),Speech synthesis,Speech-to-text,Text-to-speech (TTS)Alibaba
- Gemini 2.5 Deep ThinkLanguage modeling/generation,Mathematical reasoning,Code generation,Visual question answering,Question answering,Visual puzzles,Video description,Speech recognition (ASR),Speech-to-textGoogle,Google DeepMind
- Reka CoreChat,Language modeling/generation,Image captioning,Code generation,Code autocompletion,Question answering,Visual question answering,Video description,Speech recognition (ASR),Speech-to-text,Quantitative reasoningReka AI
- Gemini Nano-2Chat,Image captioning,Speech recognition (ASR)Google DeepMind
- Gemini Nano-1Chat,Image captioning,Speech recognition (ASR)Google DeepMind
- SeamlessM4TTranslation,Speech synthesis,Speech recognition (ASR),Speech-to-text,Speech-to-speechFacebook,INRIA,University of California (UC) Berkeley
- Qwen-Audio-ChatAudio question answering,Chat,Speech recognition (ASR),Translation,Transcription,Text classification,Question answering,Audio classification,Voice identification,Part-of-speech tagging,Speech-to-speechAlibaba
- OmniVecImage classification,Speech recognition (ASR)TensorTour
- ONE-PEACEImage classification,Speech recognition (ASR),Audio question answering,Audio classification,Semantic segmentationAlibaba,Huazhong University of Science and Technology
- ImageBindImage classification,Speech recognition (ASR),Image generation,Language modeling/generationMeta AI