Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Jan-13-2025, 14:10:52 GMT–Neural Information Processing Systems

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

downstream task, head and prompt tuning, pretrained language model help, (4 more...)

Neural Information Processing Systems

Jan-13-2025, 14:10:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (1.00)