Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

Mar-27-2025, 16:13:28 GMT–Neural Information Processing Systems

The two-stage fine-tuning (FT) method, linear probing (LP) then fine-tuning (LP-FT), outperforms linear probing and FT alone. This holds true for both indistribution (ID) and out-of-distribution (OOD) data. One key reason for its success is the preservation of pre-trained features, achieved by obtaining a near-optimal linear head during LP. However, despite the widespread use of large language models, there has been limited exploration of more complex architectures such as Transformers. In this paper, we analyze the training dynamics of LP-FT for classification tasks on the basis of the neural tangent kernel (NTK) theory.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Mar-27-2025, 16:13:28 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.46)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning (1.00)
  - Natural Language > Large Language Model (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found