Transfer Learning


Adapters: A Compact and Extensible Transfer Learning Method for NLP

#artificialintelligence

Parameter inefficiency, in the context of transfer learning for NLP, arises when an entirely new model needs to be trained for every downstream task and the number of parameters grows too large. A recent paper proposes adapter modules which provide parameter efficiency by only adding a few trainable parameters per task, and as new tasks are added previous ones don't require revisiting. The main idea of this paper is to enable transfer learning for NLP on an incoming stream of tasks without training a new model for every new task. A standard fine-tuning model copies weights from a pre-trained network and tunes them on a downstream task which requires a new set of weights for each task. In other words, the parameters are adjusted together with new layers for each task.


Transfer Learning by Modeling a Distribution over Policies

arXiv.org Artificial Intelligence

We present a transfer learning strategy which fundamentally relies on Bayesian deep learning and the ability to represent Exploration and adaptation to new tasks in a transfer a distribution over functions, as in (Bachman et al., 2018) learning setup is a central challenge in reinforcement (Garnelo et al., 2018). Bayesian methods rely on modeling learning. In this work, we build on the uncertainty over value functions to represent the agent's the idea of modeling a distribution over policies belief of the environment. Recent work has shown that in a Bayesian deep reinforcement learning setup neural networks can be used to represent an uncertainty to propose a transfer strategy. Recent works over the space of all possible functions (Bachman et al., have shown to induce diversity in the learned 2018). The idea of modeling a distribution over functions policies by maximizing the entropy of a distribution can be adapted in the RL setting to model a distribution over of policies (Bachman et al., 2018; Garnelo policies, such that we can also maximize the entropy over et al., 2018) and thus, we postulate that our this distribution of policies. This is similar to maximum proposed approach leads to faster exploration resulting entropy exploration in RL, where instead of local entropy in improved transfer learning. We support maximization, recent work maximizes the global entropy our hypothesis by demonstrating favorable over the space of all possible sub-optimal policies.


An Empirical Evaluation of Adversarial Robustness under Transfer Learning

arXiv.org Machine Learning

In this work, we evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features learnt by a fast gradient sign method (FGSM) and its iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training on the source task leads to more general robust features that are easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than suitable baselines. Overall, our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.


Transfer Learning with Keras and Deep Learning - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to perform transfer learning with Keras, Deep Learning, and Python on your own custom datasets. You're just hired by Yelp to work in their computer vision department. Yelp has just launched a new feature on its website that allows reviewers to take photos of their food/dishes and then associate them with particular items on a restaurant's menu. Certain nefarious users aren't taking photos of their dishes…instead, they are taking photos of… (well, you can probably guess). Figure out how to create an automated computer vision application that can distinguish between "food" and "not food", thereby allowing Yelp to continue with their new feature launch and provide value to their users. So, how are you going to build such an application? The answer lies in transfer learning via deep learning. Today marks the start of a brand new set of tutorials on transfer learning using Keras.


Transfer Learning with Keras and Deep Learning - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to perform transfer learning with Keras, Deep Learning, and Python on your own custom datasets. You're just hired by Yelp to work in their computer vision department. Yelp has just launched a new feature on its website that allows reviewers to take photos of their food/dishes and then associate them with particular items on a restaurant's menu. Certain nefarious users aren't taking photos of their dishes…instead, they are taking photos of… (well, you can probably guess). Figure out how to create an automated computer vision application that can distinguish between "food" and "not food", thereby allowing Yelp to continue with their new feature launch and provide value to their users. So, how are you going to build such an application? The answer lies in transfer learning via deep learning. Today marks the start of a brand new set of tutorials on transfer learning using Keras.


Learning What and Where to Transfer

arXiv.org Machine Learning

As the application of deep learning has expanded to real-world problems with insufficient volume of training data, transfer learning recently has gained much attention as means of improving the performance in such small-data regime. However, when existing methods are applied between heterogeneous architectures and tasks, it becomes more important to manage their detailed configurations and often requires exhaustive tuning on them for the desired performance. To address the issue, we propose a novel transfer learning approach based on meta-learning that can automatically learn what knowledge to transfer from the source network to where in the target network. Given source and target networks, we propose an efficient training scheme to learn meta-networks that decide (a) which pairs of layers between the source and target networks should be matched for knowledge transfer and (b) which features and how much knowledge from each feature should be transferred. We validate our meta-transfer approach against recent transfer learning methods on various datasets and network architectures, on which our automated scheme significantly outperforms the prior baselines that find "what and where to transfer" in a hand-crafted manner.


On Better Exploring and Exploiting Task Relationships in Multi-Task Learning: Joint Model and Feature Learning

arXiv.org Artificial Intelligence

Multitask learning (MTL) aims to learn multiple tasks simultaneously through the interdependence between different tasks. The way to measure the relatedness between tasks is always a popular issue. There are mainly two ways to measure relatedness between tasks: common parameters sharing and common features sharing across different tasks. However, these two types of relatedness are mainly learned independently, leading to a loss of information. In this paper, we propose a new strategy to measure the relatedness that jointly learns shared parameters and shared feature representations. The objective of our proposed method is to transform the features from different tasks into a common feature space in which the tasks are closely related and the shared parameters can be better optimized. We give a detailed introduction to our proposed multitask learning method. Additionally, an alternating algorithm is introduced to optimize the nonconvex objection. A theoretical bound is given to demonstrate that the relatedness between tasks can be better measured by our proposed multitask learning algorithm. We conduct various experiments to verify the superiority of the proposed joint model and feature a multitask learning method.


Lautum Regularization for Semi-supervised Transfer Learning

arXiv.org Machine Learning

Transfer learning is a very important tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate in various transfer learning experiments the effectiveness of the proposed approach.


Transfer learning: the dos and don'ts

#artificialintelligence

If you have recently started doing work in deep learning, especially image recognition, you might have seen the abundance of blog posts all over the internet, promising to teach you how to build a world-class image classifier in a dozen or fewer lines and just a few minutes on a modern GPU. What's shocking is not the promise but the fact that most of these tutorials end up delivering on it. To those trained in'conventional' machine learning techniques, the very idea that a model developed for one data set could simply be applied to a different one sounds absurd. The answer is, of course, transfer learning, one of the most fascinating features of deep neural networks. In this post, we'll first look at what transfer learning is, when it will work, when it might work, and why it won't work in some cases, finally concluding with some pointers at best practices for transfer learning.


A Principled Approach for Learning Task Similarity in Multitask Learning

arXiv.org Machine Learning

Multitask learning aims at solving a set of related tasks simultaneously, by exploiting the shared knowledge for improving the performance on individual tasks. Hence, an important aspect of multitask learning is to understand the similarities within a set of tasks. Previous works have incorporated this similarity information explicitly (e.g., weighted loss for each task) or implicitly (e.g., adversarial loss for feature adaptation), for achieving good empirical performances. However, the theoretical motivations for adding task similarity knowledge are often missing or incomplete. In this paper, we give a different perspective from a theoretical point of view to understand this practice. We first provide an upper bound on the generalization error of multitask learning, showing the benefit of explicit and implicit task similarity knowledge. We systematically derive the bounds based on two distinct task similarity metrics: H divergence and Wasserstein distance. From these theoretical results, we revisit the Adversarial Multi-task Neural Network, proposing a new training algorithm to learn the task relation coefficients and neural network parameters iteratively. We assess our new algorithm empirically on several benchmarks, showing not only that we find interesting and robust task relations, but that the proposed approach outperforms the baselines, reaffirming the benefits of theoretical insight in algorithm design.