Applying SoftTriple Loss for Supervised Language Model Fine Tuning

Sosnowski, Witold, Wroblewska, Anna, Gawrysiak, Piotr

Dec-15-2021–arXiv.org Artificial Intelligence

Natural language processing (NLP) is a rapidly growing area of machine learning with applications wherever a computer needs to operate on a text that involves capturing its semantics. It may include text classification, translation, text summarization, question answering, dialogues. All these tasks are upstream and depend on the quality of the text representation (White et al., 2015). Many models can produce such text representations, from Bag-Of-Word or Word2Vec word embedding to the state-of-the-art language representation model BERT with variations in most NLP tasks. The best performance on text classification tasks is obtained when the model is first trained on a general knowledge corpus to capture semantic relationships between words and then fine-tuned with an additional dense layer on a domain corpus with cross-entropy loss (Radford et al., 2019). We introduce a new loss function TripleEntropy to improve classification performance for fine-tuning general knowledge pre-trained language models based on cross-entropy loss and SoftTriple loss (Devlin et al., 2018; Qian et al., 2019). Triplet Loss transforms the embedding space so that vector representations from the same class can form separable subspaces, stabilizing, and generalizing the language model fine-tuning process. TripleEntropy can improve the fine-tuning process of the RoBERTa based models so the performance on downstream task increases by about (0.02% - 2.29%).

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Dec-15-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Poland > Masovia Province > Warsaw (0.05)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.68)
  - Machine Learning
    - Statistical Learning (0.93)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found