AD-D ROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Yang

May-29-2025, 19:53:06 GMT–Neural Information Processing Systems

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

May-29-2025, 19:53:06 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East (0.14)
- Europe > Portugal (0.14)

Genre:
- Research Report (0.68)

Industry:
- Banking & Finance > Insurance (0.52)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language > Text Processing (0.68)