We are indeed living in interesting times, where we celebrate human-built machines defeating the best human minds at variety of activities. IBM Deep Blue's win against Chess champion Gary kasparov in 1997, IBM watson acing Jeopardy in 2011 and now Google DeepMind reportedly wining'Go' with high precision, being cited as a major breakthrough in AI, which even Facebook claims their team came close to acing the game as well. DeepMind goes against the'Go' champion, to be streamed live for the world to witness. While these feats are undoubtedly remarkable, and as understandable its creating quite a buzz in the AI community; as it provides the glimpse to the future seen only in sci-fi. As exciting as it may sound, it leaves a few questions before us.
The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples.
Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as "transferability". Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parameter-sharing or fine-tuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a fine-grained nonlinear combination of activations from both the source and the target domain networks, and subsumes hand-crafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parameter-sharing and fine-tuning in most settings.
The success of transfer learning on a target task is highly dependent on the selected source data. Instance transfer methods reuse data from the source tasks to augment the training data for the target task. If poorly chosen, this source data may inhibit learning, resulting in negative transfer. The current most widely used algorithm for instance transfer, TrAdaBoost, performs poorly when given irrelevant source data. We present a novel task-based boosting technique for instance transfer that selectively chooses the source knowledge to transfer to the target task. Our approach performs boosting at both the instance level and the task level, assigning higher weight to those source tasks that show positive transferability to the target task, and adjusting the weights of individual instances within each source task via AdaBoost. We show that this combination of task- and instance-level boosting significantly improves transfer performance over existing instance transfer algorithms when given a mix of relevant and irrelevant source data, especially for small amounts of data on the target task.
As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.