Auto-Transfer: Learning to Route Transferrable Representations
Murugesan, Keerthiram, Sadashivaiah, Vijay, Luss, Ronny, Shanmugam, Karthikeyan, Chen, Pin-Yu, Dhurandhar, Amit
–arXiv.org Artificial Intelligence
Knowledge transfer between heterogeneous source and target networks and tasks has received a lot of attention in recent times as large amounts of quality labelled data can be difficult to obtain in many applications. Existing approaches typically constrain the target deep neural network (DNN) feature representations to be close to the source DNNs feature representations, which can be limiting. We, in this paper, propose a novel adversarial multi-armed bandit approach which automatically learns to route source representations to appropriate target representations following which they are combined in meaningful ways to produce accurate target models. We see upwards of 5% accuracy improvements compared with the stateof-the-art knowledge transfer methods on four benchmark (target) image datasets CUB200, Stanford Dogs, MIT67 and Stanford40 where the source dataset is ImageNet. We qualitatively analyze the goodness of our transfer scheme by showing individual examples of the important features our target network focuses on in different layers compared with the (closest) competitors. We also observe that our improvement over other methods is higher for smaller target datasets making it an effective tool for small data applications that may benefit from transfer learning. Deep learning models have become increasingly good at learning from large amounts of labeled data. However, it is often difficult and expensive to collect sufficient a amount of labeled data for training a deep neural network (DNN). Transfer learning utilizes the knowledge from information-rich source tasks to learn a specific (often information-poor) target task. There are several ways to transfer knowledge from source task to target task (Pan & Yang, 2009), but the most widely used approach is fine-tuning (Sharif Razavian et al., 2014) where the target DNN being trained is initialized with the weights/representations of a source (often large) DNN (e.g. ResNet (He et al., 2016)) that has been pre-trained on a large dataset (e.g. In spite of its popularity, fine-tuning may not be ideal when the source and target tasks/networks are heterogeneous i.e. differing feature spaces or distributions (Ryu et al., 2020; Tsai et al., 2020).
arXiv.org Artificial Intelligence
Feb-3-2022