AD-D ROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Yang
–Neural Information Processing Systems
Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting.
Neural Information Processing Systems
May-29-2025, 19:53:06 GMT
- Country:
- Asia > Middle East (0.14)
- Europe > Portugal (0.14)
- Genre:
- Research Report (0.68)
- Industry:
- Banking & Finance > Insurance (0.52)
- Technology: