cahya/bert-base-indonesian-522M
Cahya/bert-base-indonesian-522M is machine learning model.
About cahya/bert-base-indonesian-522M
It is BERT-base model pre-trained with indonesian Wikipedia using a masked language modeling (MLM) objective . This model is uncased: it does not make a difference between indonesia and Indonesia . The inputs of the model are then of the form: Sentence A [SEP] Sentence B (SEP) Sentence D (SSEP), Sentence C (SDEP) or Sentence E (SEXP) The inputs are lowercased and tokenized using WordPiece and a vocabulary size of 32,000 . This is one of several other language models that have been pre-trained with ind,