Transfer Learning: Overviews


The State of Transfer Learning in NLP

#artificialintelligence

This post expands on the NAACL 2019 tutorial on Transfer Learning in NLP. The tutorial was organized by Matthew Peters, Swabha Swayamdipta, Thomas Wolf, and me. In this post, I highlight key insights and takeaways and provide updates based on recent work. The slides, a Colaboratory notebook, and code of the tutorial are available online. For an overview of what transfer learning is, have a look at this blog post. Transfer learning is a means to extract knowledge from a source setting and apply it to a different target setting. In the span of little more than a year, transfer learning in the form of pretrained language models has become ubiquitous in NLP and has contributed to the state of the art on a wide range of tasks.


The State of Transfer Learning in NLP

#artificialintelligence

This post expands on the NAACL 2019 tutorial on Transfer Learning in NLP. The tutorial was organized by Matthew Peters, Swabha Swayamdipta, Thomas Wolf, and me. In this post, I highlight key insights and takeaways and provide updates based on recent work. The slides, a Colaboratory notebook, and code of the tutorial are available online. For an overview of what transfer learning is, have a look at this blog post. In the span of little more than a year, transfer learning in the form of pretrained language models has become ubiquitous in NLP and has contributed to the state of the art on a wide range of tasks.


Theoretical Guarantees of Transfer Learning

arXiv.org Machine Learning

Transfer learning has been proven effective when within-target labeled data is scarce. A lot of works have developed successful algorithms and empirically observed positive transfer effect that improves target generalization error using source knowledge. However, theoretical analysis of transfer learning is more challenging due to the nature of the problem and thus is less studied. In this report, we do a survey of theoretical works in transfer learning and summarize key theoretical guarantees that prove the effectiveness of transfer learning. The theoretical bounds are derived using model complexity and learning algorithm stability. As we should see, these works exhibit a trade-off between tight bounds and restrictive assumptions. Moreover, we also prove a new generalization bound for the multi-source transfer learning problem using the VC-theory, which is more informative than the one proved in previous work.


A Survey on Deep Transfer Learning

arXiv.org Machine Learning

As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.


Spatial Projection of Multiple Climate Variables Using Hierarchical Multitask Learning

AAAI Conferences

Future projection of climate is typically obtained by combining outputs from multiple Earth System Models (ESMs) for several climate variables such as temperature and precipitation. While IPCC has traditionally used a simple model output average, recent work has illustrated potential advantages of using a multitask learning (MTL) framework for projections of individual climate variables. In this paper we introduce a framework for hierarchical multitask learning (HMTL) with two levels of tasks such that each super-task, i.e., task at the top level, is itself a multitask learning problem over sub-tasks. For climate projections, each super-task focuses on projections of specific climate variables spatially using an MTL formulation. For the proposed HMTL approach, a group lasso regularization is added to couple parameters across the super-tasks, which in the climate context helps exploit relationships among the behavior of different climate variables at a given spatial location. We show that some recent works on MTL based on learning task dependency structures can be viewed as special cases of HMTL. Experiments on synthetic and real climate data show that HMTL produces better results than decoupled MTL methods applied separately on the super-tasks and HMTL significantly outperforms baselines for climate projection.


Spatial Projection of Multiple Climate Variables using Hierarchical Multitask Learning

arXiv.org Machine Learning

Future projection of climate is typically obtained by combining outputs from multiple Earth System Models (ESMs) for several climate variables such as temperature and precipitation. While IPCC has traditionally used a simple model output average, recent work has illustrated potential advantages of using a multitask learning (MTL) framework for projections of individual climate variables. In this paper we introduce a framework for hierarchical multitask learning (HMTL) with two levels of tasks such that each super-task, i.e., task at the top level, is itself a multitask learning problem over sub-tasks. For climate projections, each super-task focuses on projections of specific climate variables spatially using an MTL formulation. For the proposed HMTL approach, a group lasso regularization is added to couple parameters across the super-tasks, which in the climate context helps exploit relationships among the behavior of different climate variables at a given spatial location. We show that some recent works on MTL based on learning task dependency structures can be viewed as special cases of HMTL. Experiments on synthetic and real climate data show that HMTL produces better results than decoupled MTL methods applied separately on the super-tasks and HMTL significantly outperforms baselines for climate projection.