Layer-Wise Evolution of Representations in Fine-Tuned Transformers: Insights from Sparse AutoEncoders

Feb-23-2025–arXiv.org Artificial Intelligence

Fine-tuning pre-trained transformers is a powerful technique for enhancing the performance of base models on specific tasks. From early applications in models like BERT to fine-tuning Large Language Models (LLMs), this approach has been instrumental in adapting general-purpose architectures for specialized downstream tasks. Understanding the fine-tuning process is crucial for uncovering how transformers adapt to specific objectives, retain general representations, and acquire task-specific features. This paper explores the underlying mechanisms of fine-tuning, specifically in the BERT transformer, by analyzing activation similarity, training Sparse AutoEncoders (SAEs), and visualizing token-level activations across different layers. Based on experiments conducted across multiple datasets and BERT layers, we observe a steady progression in how features adapt to the task at hand: early layers primarily retain general representations, middle layers act as a transition between general and task-specific features, and later layers fully specialize in task adaptation. These findings provide key insights into the inner workings of fine-tuning and its impact on representation learning within transformer architectures.

activation, fine-tuning, representation, (16 more...)

arXiv.org Artificial Intelligence

Feb-23-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (0.46)
- Leisure & Entertainment (0.30)
- Media (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found