seyonec/ChemBERTa-zinc-base-v1
Seyonec/ChemBERTa-zinc-base-v1 is machine learning model.
About seyonec/ChemBERTa-zinc-base-v1
Deep learning for chemistry and materials science remains a novel field with lots of potiential. However, the popularity of transfer learning based methods in areas such as NLP and computer vision have not yet been effectively developed in computational chemistry + machine learning . Using HuggingFace's suite of models and the ByteLevel tokenizer, we are able to train on a large corpus of 100k SMILES strings from a commonly known benchmark dataset, ZINC . We propose the use of attention visualization as a helpful tool for chemistry practitioners and students to quickly identify important substructures in various chemical properties. The applications of open-sourcing large-scale transformer models such as RoBERTa may allow for the,