Transfer Learning
Domain Independent SVM for Transfer Learning in Brain Decoding
Zhou, Shuo, Li, Wenwen, Cox, Christopher R., Lu, Haiping
Brain imaging data are important in brain sciences yet expensive to obtain, with big volume (i.e., large p) but small sample size (i.e., small n). To tackle this problem, transfer learning is a promising direction that leverages source data to improve performance on related, target data. Most transfer learning methods focus on minimizing data distribution mismatch. However, a big challenge in brain imaging is the large domain discrepancies in cognitive experiment designs and subject-specific structures and functions. A recent transfer learning approach minimizes domain dependence to learn common features across domains, via the Hilbert-Schmidt Independence Criterion (HSIC). Inspired by this method, we propose a new Domain Independent Support Vector Machine (DI-SVM) for transfer learning in brain condition decoding. Specifically, DI-SVM simultaneously minimizes the SVM empirical risk and the dependence on domain information via a simplified HSIC. We use public data to construct 13 transfer learning tasks in brain decoding, including three interesting multi-source transfer tasks. Experiments show that DI-SVM's superior performance over eight competing methods on these tasks, particularly an improvement of more than 24% on multi-source transfer tasks.
Transfer learning: the dos and don'ts
If you have recently started doing work in deep learning, especially image recognition, you might have seen the abundance of blog posts all over the internet, promising to teach you how to build a world-class image classifier in a dozen or fewer lines and just a few minutes on a modern GPU. What's shocking is not the promise but the fact that most of these tutorials end up delivering on it. To those trained in'conventional' machine learning techniques, the very idea that a model developed for one data set could simply be applied to a different one sounds absurd. The answer is, of course, transfer learning, one of the most fascinating features of deep neural networks. In this post, we'll first look at what transfer learning is, when it will work, when it might work, and why it won't work in some cases, finally concluding with some pointers at best practices for transfer learning.
A Principled Approach for Learning Task Similarity in Multitask Learning
Shui, Changjian, Abbasi, Mahdieh, Robitaille, Louis-Émile, Wang, Boyu, Gagné, Christian
Multitask learning aims at solving a set of related tasks simultaneously, by exploiting the shared knowledge for improving the performance on individual tasks. Hence, an important aspect of multitask learning is to understand the similarities within a set of tasks. Previous works have incorporated this similarity information explicitly (e.g., weighted loss for each task) or implicitly (e.g., adversarial loss for feature adaptation), for achieving good empirical performances. However, the theoretical motivations for adding task similarity knowledge are often missing or incomplete. In this paper, we give a different perspective from a theoretical point of view to understand this practice. We first provide an upper bound on the generalization error of multitask learning, showing the benefit of explicit and implicit task similarity knowledge. We systematically derive the bounds based on two distinct task similarity metrics: H divergence and Wasserstein distance. From these theoretical results, we revisit the Adversarial Multi-task Neural Network, proposing a new training algorithm to learn the task relation coefficients and neural network parameters iteratively. We assess our new algorithm empirically on several benchmarks, showing not only that we find interesting and robust task relations, but that the proposed approach outperforms the baselines, reaffirming the benefits of theoretical insight in algorithm design.
Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision
There is a catch to training state-of-the-art NLP models: their reliance on massive hand-labeled training sets. That's why data labeling is usually the bottleneck in developing NLP applications and keeping them up-to-date. For example, imagine how much it would cost to pay medical specialists to label thousands of electronic health records. In general, having domain experts label thousands of examples is too expensive. On top of the initial labeling cost, there is another huge cost in keeping models up-to-date with changing contexts in the real-world.
What Every NLP Engineer Needs to Know About Pre-Trained Language Models
Practical applications of Natural Language Processing (NLP) have gotten significantly cheaper, faster, and easier due to the transfer learning capabilities enabled by pre-trained language models. Transfer learning enables engineers to pre-train an NLP model on one large dataset and then quickly fine-tune the model to adapt to other NLP tasks. This new approach enables NLP models to learn both lower-level and higher-level features of language, leading to much better model performance for virtually all standard NLP tasks and a new standard for industry best practices. To help you quickly understand the significance of this technical achievement and how it accelerates your own work in NLP, we've summarized the key lessons you should know in easy-to-read bullet-point format. We've also included summaries of the 3 most important research papers in the space that you need to be aware of. If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular indusry updates below.
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Triantafillou, Eleni, Zhu, Tyler, Dumoulin, Vincent, Lamblin, Pascal, Xu, Kelvin, Goroshin, Ross, Gelada, Carles, Swersky, Kevin, Manzagol, Pierre-Antoine, Larochelle, Hugo
Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose Meta-Dataset: a new benchmark for training and evaluating few-shot classifiers that is large-scale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of state-of-the-art models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular few-shot classification methods, and to propose a new method, Proto-MAML, which achieves improved performance on our benchmark.
Learning Representations from Persian Handwriting for Offline Signature Verification, a Deep Transfer Learning Approach
Mersa, Omid, Etaati, Farhood, Masoudnia, Saeed, Araabi, Babak N.
Offline Signature Verification (OSV) is a challenging pattern recognition task, especially when it is expected to generalize well on the skilled forgeries that are not available during the training. Its challenges also include small training sample and large intra-class variations. Considering the limitations, we suggest a novel transfer learning approach from Persian handwriting domain to multi-language OSV domain. We train two Residual CNNs on the source domain separately based on two different tasks of word classification and writer identification. Since identifying a person signature resembles identifying ones handwriting, it seems perfectly convenient to use handwriting for the feature learning phase. The learned representation on the more varied and plentiful handwriting dataset can compensate for the lack of training data in the original task, i.e. OSV, without sacrificing the generalizability. Our proposed OSV system includes two steps: learning representation and verification of the input signature. For the first step, the signature images are fed into the trained Residual CNNs. The output representations are then used to train SVMs for the verification. We test our OSV system on three different signature datasets, including MCYT (a Spanish signature dataset), UTSig (a Persian one) and GPDS-Synthetic (an artificial dataset). On UT-SIG, we achieved 9.80% Equal Error Rate (EER) which showed substantial improvement over the best EER in the literature, 17.45%. Our proposed method surpassed state-of-the-arts by 6% on GPDS-Synthetic, achieving 6.81%. On MCYT, EER of 3.98% was obtained which is comparable to the best previously reported results.
Active Transfer Learning for Persian Offline Signature Verification
Younesian, Taraneh, Masoudnia, Saeed, Hosseini, Reshad, Araabi, Babak N.
Offline Signature Verification (OSV) remains a challenging pattern recognition task, especially in the presence of skilled forgeries that are not available during the training. This challenge is aggravated when there are small labeled training data available but with large intra-personal variations. In this study, we address this issue by employing an active learning approach, which selects the most informative instances to label and therefore reduces the human labeling effort significantly. Our proposed OSV includes three steps: feature learning, active learning, and final verification. We benefit from transfer learning using a pre-trained CNN for feature learning. We also propose SVM-based active learning for each user to separate his genuine signatures from the random forgeries. We finally used the SVMs to verify the authenticity of the questioned signature. We examined our proposed active transfer learning method on UTSig: A Persian offline signature dataset. We achieved near 13% improvement compared to the random selection of instances. Our results also showed 1% improvement over the state-of-the-art method in which a fully supervised setting with five more labeled instances per user was used.
Transfer Learning for Performance Modeling of Configurable Systems: A Causal Analysis
Javidian, Mohammad Ali, Jamshidi, Pooyan, Valtorta, Marco
Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis \cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
Whatmough, Paul N., Zhou, Chuteng, Hansen, Patrick, Venkataramanaiah, Shreyas Kolala, Seo, Jae-sun, Mattina, Matthew
The computational demands of computer vision tasks based on state-of-the-art Convolutional Neural Network (CNN) image classification far exceed the energy budgets of mobile devices. This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transfered part, and the programmable part being learnt on the target dataset. Experimental results demonstrate FixyNN hardware can achieve very high energy efficiencies up to 26.6 TOPS/W (4.81 better than iso-area programmable accelerator). Over a suite of six datasets we trained models via transfer learning with an accuracy loss of 1% resulting in up to 11.2 TOPS/W - nearly 2 more efficient than a conventional programmable CNN accelerator of the same area. Mobile devices exhibit Figure 1: FixyNN proposes to split a deep CNN into two constraints in the energy and silicon area that can be parts, which are implemented in hardware using a (shared) allocated to CV tasks, which limits the adoption of CNNs fixed-weight feature extractor (FFE) hardware accelerator at high resolution and frame-rate (e.g. MobileNetV1 similar accuracy to VGG (top-5 ImageNet highlights the performance and power efficiency advantage 89.9% vs. 92.7%), The second trend is the emergence of specialized to buffering data in fixed-weight layers and our tool hardware accelerators tailored specifically to CNN flow for automatically generated hardware.