Skip to content

PaLI-3

Google DeepMindGoogle ResearchGoogle CloudVisual question answeringCharacter recognition (OCR)Image captioning

PaLI-3 is a visual question answering model from Google DeepMind,Google Research,Google Cloud released in 2023 with 5000000000.0 parameters.

About PaLI-3

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrain

Details

Provider
Google DeepMind,Google Research,Google Cloud
Task
Visual question answering,Character recognition (OCR),Image captioning
Parameters
5000000000.0
Released
2023-10-17
Open weights
No
View model source

Explore

FAQ