Memory Efficient Continual Learning with Transformers

Neural Information Processing Systems 

To address the issue of incremental fine-tuning of pre-trained Transformers in the sequential learning setting without CF, we propose Adaptive Distillation of Adapters (ADA).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found