AD-D ROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning Tao Y ang

Neural Information Processing Systems 

This observation suggests that attention positions should not be treated the same in dropout.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found