Transfer learning allows practitioners to recognize and apply knowledge learned in previous tasks (source task) to new tasks or new domains (target task), which share some commonality. The two important factors impacting the performance of transfer learning models are: (a) the size of the target dataset, and (b) the similarity in distribution between source and target domains. Thus far, there has been little investigation into just how important these factors are. In this paper, we investigate the impact of target dataset size and source/target domain similarity on model performance through a series of experiments. We find that more data is always beneficial, and model performance improves linearly with the log of data size, until we are out of data. As source/target domains differ, more data is required and fine tuning will render better performance than feature extraction. When source/target domains are similar and data size is small, fine tuning and feature extraction renders equivalent performance. Our hope is that by beginning this quantitative investigation on the effect of data volume and domain similarity in transfer learning we might inspire others to explore the significance of data in developing more accurate statistical models.
Nickisch, and Harmeling 2009; Parikh and Grauman We propose an attention-guided transfer network. Briefly, 2011; Akata et al. 2013), learn object models expediently our approach works as follows. First, the network receives by providing information about multiple object classes training images for attributes in both the source and target with each attribute label (Kovashka, Vijayanarasimhan, and domains. Second, it separately learns models for the attributes Grauman 2011; Parkash and Parikh 2012), interactively recognize in each domain, and then measures how related each fine-grained object categories (Branson et al. 2010; target domain classifier is to the classifiers in the source domains. Wah and Belongie 2013), and learn to retrieve images from Finally, it uses these measures of similarity (relatedness) precise human feedback (Kumar et al. 2011; Kovashka, to compute a weighted combination of the source classifiers, Parikh, and Grauman 2015). Recent ConvNet approaches which then becomes the new classifier for the target have shown how to learn accurate attribute models through attribute. We develop two methods, one where the target and multi-task learning (Fouhey, Gupta, and Zisserman 2016; source domains are disjoint, and another where there is some Huang et al. 2015) or by localizing attributes (Xiao and overlap between them. Importantly, we show that when the Jae Lee 2015; Singh and Lee 2016). However, deep learning source attributes come from a diverse set of domains, the with ConvNets requires a large amount of data to be available gain we obtain from this transfer of knowledge is greater for the task of interest, or for a related task (Oquab et than if only use attributes from the same domain.
Conventional face super-resolution methods usually assume testing low-resolution (LR) images lie in the same domain as the training ones. Due to different lighting conditions and imaging hardware, domain gaps between training and testing images inevitably occur in many real-world scenarios. Neglecting those domain gaps would lead to inferior face super-resolution (FSR) performance. However, how to transfer a trained FSR model to a target domain efficiently and effectively has not been investigated. To tackle this problem, we develop a Domain-Aware Pyramid-based Face Super-Resolution network, named DAP-FSR network. Our DAP-FSR is the first attempt to super-resolve LR faces from a target domain by exploiting only a pair of high-resolution (HR) and LR exemplar in the target domain. To be specific, our DAP-FSR firstly employs its encoder to extract the multi-scale latent representations of the input LR face. Considering only one target domain example is available, we propose to augment the target domain data by mixing the latent representations of the target domain face and source domain ones, and then feed the mixed representations to the decoder of our DAP-FSR. The decoder will generate new face images resembling the target domain image style. The generated HR faces in turn are used to optimize our decoder to reduce the domain gap. By iteratively updating the latent representations and our decoder, our DAP-FSR will be adapted to the target domain, thus achieving authentic and high-quality upsampled HR faces. Extensive experiments on three newly constructed benchmarks validate the effectiveness and superior performance of our DAP-FSR compared to the state-of-the-art.
We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.
Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a source-target selective joint fine-tuning scheme for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task with insufficient training data is carried out simultaneously with another source learning task with abundant training data. However, the source learning task does not use all existing training data. Our core idea is to identify and use a subset of training images from the original source learning task whose low-level characteristics are similar to those from the target learning task, and jointly fine-tune shared convolutional layers for both tasks. Specifically, we compute descriptors from linear or nonlinear filter bank responses on training images from both tasks, and use such descriptors to search for a desired subset of training samples for the source learning task. Experiments demonstrate that our selective joint fine-tuning scheme achieves state-of-the-art performance on multiple visual classification tasks with insufficient training data for deep learning. Such tasks include Caltech 256, MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to fine-tuning without a source domain, the proposed method can improve the classification accuracy by 2% - 10% using a single model.