NVILA 15B
NVIDIAMassachusetts Institute of Technology (MIT)University of California (UC) BerkeleyUniversity of California San DiegoUniversity of WashingtonTsinghua UniversityVisual question answeringVideo descriptionLanguage modeling/generationQuestion answeringCharacter recognition (OCR)Open weights
Developed by NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University in 2024, NVILA 15B is a visual question answering model with 15000000000.0 parameters with openly available weights.
About NVILA 15B
Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy
Details
- Provider
- NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University
- Task
- Visual question answering,Video description,Language modeling/generation,Question answering,Character recognition (OCR)
- Parameters
- 15000000000.0
- Released
- 2024-12-05
- Open weights
- Yes
Explore
More from NVIDIAMore from Massachusetts Institute of Technology (MIT)More from University of California (UC) BerkeleyMore from University of California San DiegoMore from University of WashingtonMore from Tsinghua UniversityVisual question answering modelsVideo description modelsLanguage modeling/generation modelsAll models