Transfer Learning

On Better Exploring and Exploiting Task Relationships in Multi-Task Learning: Joint Model and Feature Learning Artificial Intelligence

Multitask learning (MTL) aims to learn multiple tasks simultaneously through the interdependence between different tasks. The way to measure the relatedness between tasks is always a popular issue. There are mainly two ways to measure relatedness between tasks: common parameters sharing and common features sharing across different tasks. However, these two types of relatedness are mainly learned independently, leading to a loss of information. In this paper, we propose a new strategy to measure the relatedness that jointly learns shared parameters and shared feature representations. The objective of our proposed method is to transform the features from different tasks into a common feature space in which the tasks are closely related and the shared parameters can be better optimized. We give a detailed introduction to our proposed multitask learning method. Additionally, an alternating algorithm is introduced to optimize the nonconvex objection. A theoretical bound is given to demonstrate that the relatedness between tasks can be better measured by our proposed multitask learning algorithm. We conduct various experiments to verify the superiority of the proposed joint model and feature a multitask learning method.

Lautum Regularization for Semi-supervised Transfer Learning Machine Learning

Transfer learning is a very important tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate in various transfer learning experiments the effectiveness of the proposed approach.

Transfer learning: the dos and don'ts


If you have recently started doing work in deep learning, especially image recognition, you might have seen the abundance of blog posts all over the internet, promising to teach you how to build a world-class image classifier in a dozen or fewer lines and just a few minutes on a modern GPU. What's shocking is not the promise but the fact that most of these tutorials end up delivering on it. To those trained in'conventional' machine learning techniques, the very idea that a model developed for one data set could simply be applied to a different one sounds absurd. The answer is, of course, transfer learning, one of the most fascinating features of deep neural networks. In this post, we'll first look at what transfer learning is, when it will work, when it might work, and why it won't work in some cases, finally concluding with some pointers at best practices for transfer learning.

A Principled Approach for Learning Task Similarity in Multitask Learning Machine Learning

Multitask learning aims at solving a set of related tasks simultaneously, by exploiting the shared knowledge for improving the performance on individual tasks. Hence, an important aspect of multitask learning is to understand the similarities within a set of tasks. Previous works have incorporated this similarity information explicitly (e.g., weighted loss for each task) or implicitly (e.g., adversarial loss for feature adaptation), for achieving good empirical performances. However, the theoretical motivations for adding task similarity knowledge are often missing or incomplete. In this paper, we give a different perspective from a theoretical point of view to understand this practice. We first provide an upper bound on the generalization error of multitask learning, showing the benefit of explicit and implicit task similarity knowledge. We systematically derive the bounds based on two distinct task similarity metrics: H divergence and Wasserstein distance. From these theoretical results, we revisit the Adversarial Multi-task Neural Network, proposing a new training algorithm to learn the task relation coefficients and neural network parameters iteratively. We assess our new algorithm empirically on several benchmarks, showing not only that we find interesting and robust task relations, but that the proposed approach outperforms the baselines, reaffirming the benefits of theoretical insight in algorithm design.

Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision


There is a catch to training state-of-the-art NLP models: their reliance on massive hand-labeled training sets. That's why data labeling is usually the bottleneck in developing NLP applications and keeping them up-to-date. For example, imagine how much it would cost to pay medical specialists to label thousands of electronic health records. In general, having domain experts label thousands of examples is too expensive. On top of the initial labeling cost, there is another huge cost in keeping models up-to-date with changing contexts in the real-world.

What Every NLP Engineer Needs to Know About Pre-Trained Language Models


Practical applications of Natural Language Processing (NLP) have gotten significantly cheaper, faster, and easier due to the transfer learning capabilities enabled by pre-trained language models. Transfer learning enables engineers to pre-train an NLP model on one large dataset and then quickly fine-tune the model to adapt to other NLP tasks. This new approach enables NLP models to learn both lower-level and higher-level features of language, leading to much better model performance for virtually all standard NLP tasks and a new standard for industry best practices. To help you quickly understand the significance of this technical achievement and how it accelerates your own work in NLP, we've summarized the key lessons you should know in easy-to-read bullet-point format. We've also included summaries of the 3 most important research papers in the space that you need to be aware of.

Transfer Learning for Performance Modeling of Configurable Systems: A Causal Analysis Artificial Intelligence

Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis \cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.

Smart City Development With Urban Transfer Learning

IEEE Computer

The governments of many cities just starting smart city development will face a critical cold-start problem: how to develop a new smart city service with limited data. We investigate the common process of urban transfer learning, i.e., leveraging transfer learning to accelerate smart city development, and also provide city planners and relevant practitioners with guidelines for applying this novel learning paradigm.

ML for Flood Forecasting at Scale Machine Learning

Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction. Floods are the most common and deadly natural disaster in the world. Every year, floods cause from thousands to tens of thousands of fatalities [1, 22, 2, 21, 14], affect hundreds of millions of people [14, 21, 2], and cause tens of billions of dollars worth of damages [1, 2]. These numbers have only been increasing in recent decades [23]. Indeed, the UN charter notes floods to be one of the key motivators for formulating the sustainable development goals (SDGs), and directly challenges us: "They knew that earthquakes and floods were inevitable, but that the high death tolls were not."

Multi-Source Transfer Learning for Non-Stationary Environments Machine Learning

Abstract--In data stream mining, predictive models typically suffer drops in predictive performance due to concept drift. As enough data representing the new concept must be collected for the new concept to be well learnt, the predictive performance of existing models usually takes some time to recover from concept drift. T o speed up recovery from concept drift and improve predictive performance in data stream mining, this work proposes a novel approach called Multi-sourcE onLine TrAnsfer learning for Non-statIonary Environments (Melanie). Melanie is the first approach able to transfer knowledge between multiple data streaming sources in non-stationary environments. It creates several sub-classifiers to learn different aspects from different source and target concepts over time. The sub-classifiers that match the current target concept well are identified, and used to compose an ensemble for predicting examples from the target concept. We evaluate Melanie on several synthetic data streams containing different types of concept drift and on real world data streams. The results indicate that Melanie can deal with a variety drifts and improve predictive performance over existing data stream learning algorithms by making use of multiple sources. Index Terms --concept drift, non-stationary environment, multi-sources, transfer learning. I NTRODUCTION Many real world applications produce data in a streaming fashion, i.e., as a sequence of observations that arrive over time. Examples include prediction of customer behaviour, credit card approval, fraud detection, software effort estimation, software defect prediction, etc. A challenge in data stream mining is how to describe a given target probability distribution accurately without knowing the whole data stream beforehand.