Skip to content

cahya/bert-base-indonesian-522M

Cahya/bert-base-indonesian-522M is machine learning model.

About cahya/bert-base-indonesian-522M

It is BERT-base model pre-trained with indonesian Wikipedia using a masked language modeling (MLM) objective . This model is uncased: it does not make a difference between indonesia and Indonesia . The inputs of the model are then of the form: Sentence A [SEP] Sentence B (SEP) Sentence D (SSEP), Sentence C (SDEP) or Sentence E (SEXP) The inputs are  lowercased and tokenized using WordPiece and a vocabulary size of 32,000 . This is one of several other language models that have been pre-trained with ind,
View model source

Explore

FAQ