Differentiable Architecture Pruning for Transfer Learning
Transfer learning methods aim to produce machine learning models that are trained on a given problem but perform well also on different new tasks. The interest in transfer learning comes from situations where large data sets can be used for solving a given training task but the data associated with new tasks are too small to train expressive models from scratch. The general transfer-learning strategy is to use the small available new data for adapting a large model that has been previously optimized on the training data. One option consists of keeping the structure of the pre-trained large model intact and fine-tuning its weights to solve the new task. When very few data points are available and the pre-trained network is large, however, customized regularization strategies are needed to mitigate the risk of over-fitting. Fine-tuning only a few parameters is a possible way out but can strongly limit the performance of the final model. Another option is to prune the pre-trained model to reduce its complexity, increase transferability, and prevent overfitting. Existing strategies, however, focus on optimized models and are unable to disentangle the network architecture from the attached weights. As a consequence, the pruned version of the original model can hardly be interpreted as a transferable new architecture and it is difficult to reuse it on new tasks.
Jul-7-2021