Fujitsu/pytorrent
Fujitsu/pytorrent is a machine learning model.
About Fujitsu/pytorrent
We use PyTorrent dataset to train a preliminary DistilBERT-Masked Language Modeling(MLM) model from scratch . We use 1M raw Python scripts of PyTorrent that includes 12,350,000 LOC to train the model . We also train a byte-level Byte-pair encoding (BPE) tokenizer that includes 56,000 tokens, which is truncated LOC with the length of 50 to save computation resources . The trained model aims to help researchers to easily and efficiently work on a large dataset of Python packages using only 5 lines of codes to load the transformer-based model . The model is trained with a Masked language Model (MLM), objective. It,