DarshanDeshpande/marathi-distilbert
The DarshanDeshpande/marathi-distilbert model is a machine learning model.
About DarshanDeshpande/marathi-distilbert
This version of Marathi-DistilBERT is trained from scratch on approximately 11.2 million sentences . It is trained using an Adam optimizer with a learning rate of 1e-4 and default β1 and β2 values of 0.9 and 0.999 respectively with a total batch size of 256 on a v3-8 TPU and mask probability of 15% . The data is cleaned by removing all languages other than Marathi, while preserving common punctuations . The training data has been extracted from a variety of sources, mainly including: Marathi Newspapers and articles . The model has not been thoroughly tested and may contain biased opinions or inappropriate language. User discretion is advised,