Skip to content

VILA1.5-13B

NVIDIAMassachusetts Institute of Technology (MIT)ChatVisual question answeringImage captioningLanguage modeling/generationQuestion answeringOpen weightscc-by-nc-4.0

The VILA1.5-13B model is an open-weights chat model from NVIDIA,Massachusetts Institute of Technology (MIT) with 13493916736.0 parameters built with transformers. With 136 downloads and 5 likes, it is widely used. It is distributed under the cc-by-nc-4.0 license.

About VILA1.5-13B

Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language p

Details

Provider
NVIDIA,Massachusetts Institute of Technology (MIT)
Task
Chat,Visual question answering,Image captioning,Language modeling/generation,Question answering
Parameters
13493916736.0
Library
transformers
License
cc-by-nc-4.0
Released
2024-05-03
Open weights
Yes
Downloads
136
Likes
5
View model source

Explore

FAQ