Skip to content

sarahlintang/IndoBERT

Sarahlintang/IndoBERT is machine learning model.

About sarahlintang/IndoBERT

IndoBERT is a pre-trained language model based on BERT architecture for the Indonesian Language . The training of the model has been performed using Google’s original Tensorflow code on eight core Google Cloud TPU v2 . This model is equal to bert-base model which has 32,000 vocabulary size . It was proven that this model outperforms multilingual BERT for all downstream tasks. The training was done using a Google Cloud Storage bucket, for persistent storage of training data and models. The model is based on 16 GB of raw text, 2 B words from Oscar Corpus (https://oscar-corpus.com/). The training procedure has been,
View model source

Explore

FAQ