Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Oct-20-2022–arXiv.org Artificial Intelligence

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

Oct-20-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Denmark (0.04)
- Asia > China (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - New York (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
  - California > Alameda County
    - Berkeley (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.31)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found