PaLI-3
Google DeepMindGoogle ResearchGoogle CloudVisual question answeringCharacter recognition (OCR)Image captioning
PaLI-3 is a visual question answering model from Google DeepMind,Google Research,Google Cloud released in 2023 with 5000000000.0 parameters.
About PaLI-3
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrain
Details
- Provider
- Google DeepMind,Google Research,Google Cloud
- Task
- Visual question answering,Character recognition (OCR),Image captioning
- Parameters
- 5000000000.0
- Released
- 2023-10-17
- Open weights
- No