Transfer Learning is the reuse of a pre-trained model on a new problem. (Towards Data Science)
Transfer learning is a method of reusing a model or knowledge for another related task. Transfer learning is sometimes also considered as an extension of existing ML algorithms. Extensive research and work is being done in the context of transfer learning and on understanding how knowledge can be transferred among tasks. However, the Neural Information Processing Systems (NIPS) 1995 workshop Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems is believed to have provided the initial motivations for research in this field. The literature on transfer learning has gone through a lot of iterations, and the terms associated with it have been used loosely and often interchangeably.
The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples. We compare state-of-the-art methods from these three paradigms and also explore hybrid adapted-embedding methods that use limited target-domain data to fine tune embeddings constructed from source-domain data. We conduct a systematic comparison of methods in a variety of domains, varying the number of labeled instances available in the target domain (k), as well as the number of target-domain classes. We reach three principal conclusions: (1) Deep embeddings are far superior, compared to weight transfer, as a starting point for inter-domain transfer or model re-use (2) Our hybrid methods robustly outperform every few-shot learning and every deep metric learning method previously proposed, with a mean error reduction of 34% over state-of-the-art. (3) Among loss functions for discovering embeddings, the histogram loss (Ustinova & Lempitsky, 2016) is most robust. We hope our results will motivate a unification of research in weight transfer, deep metric learning, and few-shot learning.
Inertial confinement fusion (ICF) experiments are designed using computer simulations that are approximations of reality, and therefore must be calibrated to accurately predict experimental observations. In this work, we propose a novel nonlinear technique for calibrating from simulations to experiments, or from low fidelity simulations to high fidelity simulations, via "transfer learning". Transfer learning is a commonly used technique in the machine learning community, in which models trained on one task are partially retrained to solve a separate, but related task, for which there is a limited quantity of data. We introduce the idea of hierarchical transfer learning, in which neural networks trained on low fidelity models are calibrated to high fidelity models, then to experimental data. This technique essentially bootstraps the calibration process, enabling the creation of models which predict high fidelity simulations or experiments with minimal computational cost. We apply this technique to a database of ICF simulations and experiments carried out at the Omega laser facility. Transfer learning with deep neural networks enables the creation of models that are more predictive of Omega experiments than simulations alone. The calibrated models accurately predict future Omega experiments, and are used to search for new, optimal implosion designs.
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices. This paper proposes FixyNN, a co-designed hardware accelerator platform which splits a CNN model into two parts: a set of layers that are fixed in the hardware platform as a front-end fixed-weight feature extractor, and the remaining layers which become a back-end classifier running on a conventional programmable CNN accelerator. The common front-end provides ubiquitous CNN features for all FixyNN models, while the back-end is programmable and specific to a given dataset. Image classification models for FixyNN are trained end-to-end via transfer learning, with front-end layers fixed for the shared feature extractor, and back-end layers fine-tuned for a specific task. Over a suite of six datasets, we trained models via transfer learning with an accuracy loss of <1%, resulting in a FixyNN hardware platform with nearly 2 times better energy efficiency than a conventional programmable CNN accelerator of the same silicon area (i.e. hardware cost).
In recent years, supervised machine learning models have demonstrated tremendous success in a variety of application domains. Despite the promising results, these successful models are data hungry and their performance relies heavily on the size of training data. However, in many healthcare applications it is difficult to collect sufficiently large training datasets. Transfer learning can help overcome this issue by transferring the knowledge from readily available datasets (source) to a new dataset (target). In this work, we propose a hybrid instance-based transfer learning method that outperforms a set of baselines including state-of-the-art instance-based transfer learning approaches. Our method uses a probabilistic weighting strategy to fuse information from the source domain to the model learned in the target domain. Our method is generic, applicable to multiple source domains, and robust with respect to negative transfer. We demonstrate the effectiveness of our approach through extensive experiments for two different applications.
Deep domain adaptation has recently undergone a big success. Compared with shallow domain adaptation, deep domain adaptation has shown higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint feature space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly appealing and motivating, but under the theoretical perspective, none of the theory has been developed to support this. In this paper, we have developed a rigorous theory to explain why we can bridge the relevant gap in an intermediate joint space. Under the light of our proposed theory, it turns out that there is a strong connection between deep domain adaptation and Wasserstein (WS) distance. More specifically, our theory revolves the following points: i) first, we propose a context wherein we can perfectly perform a transfer learning and ii) second, we further prove that by means of bridging the relevant gap and minimizing some reconstruction errors we are minimizing a WS distance between the push forward source distribution and the target distribution via a transport that maps from the source to target domains.
This spring I presented a talk entitled "Effective Transfer Learning for NLP" at ODSC East. The talk was intended to demonstrate how surprisingly effective pre-trained word and document embeddings are at low training data volumes, and to lay out a set of practical recommendations for applying these techniques to your own tasks. Thanks to some excellent research by Alec Radford and the team at OpenAI, our recommendations are beginning to change. To explain why the tides are shifting, let's first walk through the rubric we use at Indico to evaluate whether or not a novel machine learning method is viable for industry use. Let's see how well pre-trained word document embeddings satisfy these requirements: In short, using pre-trained embeddings is computationally cheap and performs well at the lower extremes of training data availability, but using static representations imposes an unfortunate cap on the benefit gained from additional training data.
In Transfer Learning, the knowledge of an already trained Machine Learning model is applied to a different but related problem. For example, if you trained a simple classifier to predict whether an image contains a backpack, you could use the knowledge that the model gained during its training to recognize other objects like sunglasses. With transfer learning, we basically try to exploit what has been learned in one task to improve generalization in another. We transfer the weights that a Network has learned at Task A to a new Task B. The general idea is to use knowledge, that a model has learned from a task where a lot of labeled training data is available, in a new task where we don't have a lot of data. Instead of starting the learning process from scratch, you start from patterns that have been learned from solving a related task.
Transfer learning has been proven effective when within-target labeled data is scarce. A lot of works have developed successful algorithms and empirically observed positive transfer effect that improves target generalization error using source knowledge. However, theoretical analysis of transfer learning is more challenging due to the nature of the problem and thus is less studied. In this report, we do a survey of theoretical works in transfer learning and summarize key theoretical guarantees that prove the effectiveness of transfer learning. The theoretical bounds are derived using model complexity and learning algorithm stability. As we should see, these works exhibit a trade-off between tight bounds and restrictive assumptions. Moreover, we also prove a new generalization bound for the multi-source transfer learning problem using the VC-theory, which is more informative than the one proved in previous work.