ceshine/TinyBERT_L-4_H-312_v2-distill-AllNLI
Ceshine/TinyBERT_L-4_H-312_v2-distill-AllNLI is machine learning model.
About ceshine/TinyBERT_L-4_H-312_v2-distill-AllNLI
This is distilled from the bert-base-nli-stsb-mean-tokens pre-trained model from Sentence-Transformers . The embedding vector is obtained by mean/average pooling of the last layer's hidden states . We compute cosine similarity scores of the embeddings of the sentence pair to get the spearman correlation on the STS benchmark (bigger is better) Update 20210325: Added the attention matrices imitation objective as in the TinyBERT paper, and the distill target has been changed from distilbert-base . to bert .base-base.nli .stsb .mean-mean .tok,