Transfer Learning


The State of Transfer Learning in NLP

#artificialintelligence

This post expands on the NAACL 2019 tutorial on Transfer Learning in NLP. The tutorial was organized by Matthew Peters, Swabha Swayamdipta, Thomas Wolf, and me. In this post, I highlight key insights and takeaways and provide updates based on recent work. The slides, a Colaboratory notebook, and code of the tutorial are available online. For an overview of what transfer learning is, have a look at this blog post. Transfer learning is a means to extract knowledge from a source setting and apply it to a different target setting. In the span of little more than a year, transfer learning in the form of pretrained language models has become ubiquitous in NLP and has contributed to the state of the art on a wide range of tasks.


The State of Transfer Learning in NLP

#artificialintelligence

This post expands on the NAACL 2019 tutorial on Transfer Learning in NLP. The tutorial was organized by Matthew Peters, Swabha Swayamdipta, Thomas Wolf, and me. In this post, I highlight key insights and takeaways and provide updates based on recent work. The slides, a Colaboratory notebook, and code of the tutorial are available online. For an overview of what transfer learning is, have a look at this blog post. In the span of little more than a year, transfer learning in the form of pretrained language models has become ubiquitous in NLP and has contributed to the state of the art on a wide range of tasks.


Adapters: A Compact and Extensible Transfer Learning Method for NLP

#artificialintelligence

Parameter inefficiency, in the context of transfer learning for NLP, arises when an entirely new model needs to be trained for every downstream task and the number of parameters grows too large. A recent paper proposes adapter modules which provide parameter efficiency by only adding a few trainable parameters per task, and as new tasks are added previous ones don't require revisiting. The main idea of this paper is to enable transfer learning for NLP on an incoming stream of tasks without training a new model for every new task. A standard fine-tuning model copies weights from a pre-trained network and tunes them on a downstream task which requires a new set of weights for each task. In other words, the parameters are adjusted together with new layers for each task.


Transfer Learning by Modeling a Distribution over Policies

arXiv.org Artificial Intelligence

We present a transfer learning strategy which fundamentally relies on Bayesian deep learning and the ability to represent Exploration and adaptation to new tasks in a transfer a distribution over functions, as in (Bachman et al., 2018) learning setup is a central challenge in reinforcement (Garnelo et al., 2018). Bayesian methods rely on modeling learning. In this work, we build on the uncertainty over value functions to represent the agent's the idea of modeling a distribution over policies belief of the environment. Recent work has shown that in a Bayesian deep reinforcement learning setup neural networks can be used to represent an uncertainty to propose a transfer strategy. Recent works over the space of all possible functions (Bachman et al., have shown to induce diversity in the learned 2018). The idea of modeling a distribution over functions policies by maximizing the entropy of a distribution can be adapted in the RL setting to model a distribution over of policies (Bachman et al., 2018; Garnelo policies, such that we can also maximize the entropy over et al., 2018) and thus, we postulate that our this distribution of policies. This is similar to maximum proposed approach leads to faster exploration resulting entropy exploration in RL, where instead of local entropy in improved transfer learning. We support maximization, recent work maximizes the global entropy our hypothesis by demonstrating favorable over the space of all possible sub-optimal policies.


An Empirical Evaluation of Adversarial Robustness under Transfer Learning

arXiv.org Machine Learning

In this work, we evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features learnt by a fast gradient sign method (FGSM) and its iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training on the source task leads to more general robust features that are easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than suitable baselines. Overall, our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.


Transfer Learning with Keras and Deep Learning - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to perform transfer learning with Keras, Deep Learning, and Python on your own custom datasets. You're just hired by Yelp to work in their computer vision department. Yelp has just launched a new feature on its website that allows reviewers to take photos of their food/dishes and then associate them with particular items on a restaurant's menu. Certain nefarious users aren't taking photos of their dishes…instead, they are taking photos of… (well, you can probably guess). Figure out how to create an automated computer vision application that can distinguish between "food" and "not food", thereby allowing Yelp to continue with their new feature launch and provide value to their users. So, how are you going to build such an application? The answer lies in transfer learning via deep learning. Today marks the start of a brand new set of tutorials on transfer learning using Keras.


Transfer Learning with Keras and Deep Learning - PyImageSearch

#artificialintelligence

In this tutorial, you will learn how to perform transfer learning with Keras, Deep Learning, and Python on your own custom datasets. You're just hired by Yelp to work in their computer vision department. Yelp has just launched a new feature on its website that allows reviewers to take photos of their food/dishes and then associate them with particular items on a restaurant's menu. Certain nefarious users aren't taking photos of their dishes…instead, they are taking photos of… (well, you can probably guess). Figure out how to create an automated computer vision application that can distinguish between "food" and "not food", thereby allowing Yelp to continue with their new feature launch and provide value to their users. So, how are you going to build such an application? The answer lies in transfer learning via deep learning. Today marks the start of a brand new set of tutorials on transfer learning using Keras.


Learning What and Where to Transfer

arXiv.org Machine Learning

As the application of deep learning has expanded to real-world problems with insufficient volume of training data, transfer learning recently has gained much attention as means of improving the performance in such small-data regime. However, when existing methods are applied between heterogeneous architectures and tasks, it becomes more important to manage their detailed configurations and often requires exhaustive tuning on them for the desired performance. To address the issue, we propose a novel transfer learning approach based on meta-learning that can automatically learn what knowledge to transfer from the source network to where in the target network. Given source and target networks, we propose an efficient training scheme to learn meta-networks that decide (a) which pairs of layers between the source and target networks should be matched for knowledge transfer and (b) which features and how much knowledge from each feature should be transferred. We validate our meta-transfer approach against recent transfer learning methods on various datasets and network architectures, on which our automated scheme significantly outperforms the prior baselines that find "what and where to transfer" in a hand-crafted manner.


Lautum Regularization for Semi-supervised Transfer Learning

arXiv.org Machine Learning

Transfer learning is a very important tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate in various transfer learning experiments the effectiveness of the proposed approach.


On Better Exploring and Exploiting Task Relationships in Multi-Task Learning: Joint Model and Feature Learning

arXiv.org Artificial Intelligence

Multitask learning (MTL) aims to learn multiple tasks simultaneously through the interdependence between different tasks. The way to measure the relatedness between tasks is always a popular issue. There are mainly two ways to measure relatedness between tasks: common parameters sharing and common features sharing across different tasks. However, these two types of relatedness are mainly learned independently, leading to a loss of information. In this paper, we propose a new strategy to measure the relatedness that jointly learns shared parameters and shared feature representations. The objective of our proposed method is to transform the features from different tasks into a common feature space in which the tasks are closely related and the shared parameters can be better optimized. We give a detailed introduction to our proposed multitask learning method. Additionally, an alternating algorithm is introduced to optimize the nonconvex objection. A theoretical bound is given to demonstrate that the relatedness between tasks can be better measured by our proposed multitask learning algorithm. We conduct various experiments to verify the superiority of the proposed joint model and feature a multitask learning method.