Goto

Collaborating Authors

 Transfer Learning


Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering

arXiv.org Machine Learning

Existing deep multitask learning (MTL) approaches align layers shared between tasks in a parallel ordering. Such an organization significantly constricts the types of shared structure that can be learned. The necessity of parallel ordering for deep MTL is first tested by comparing it with permuted ordering of shared layers. The results indicate that a flexible ordering can enable more effective sharing, thus motivating the development of a soft ordering approach, which learns how shared layers are applied in different ways for different tasks. Deep MTL with soft ordering outperforms parallel ordering methods across a series of domains. These results suggest that the power of deep MTL comes from learning highly general building blocks that can be assembled to meet the demands of each task.


Label Space Driven Heterogeneous Transfer Learning With Web Induced Alignment

AAAI Conferences

Heterogeneous Transfer Learning (HTL) algorithms leverage knowledge from a heterogeneous source domain to perform a task in a target domain. We present a novel HTL algorithm that works even where there are no shared features, instance correspondences and further, the two domains do not have identical labels. We utilize the label relationships via web-distance to align the data of the domains in the projected space, while preserving the structure of the original data.


Enhancing RNN Based OCR by Transductive Transfer Learning From Text to Images

AAAI Conferences

This paper presents a novel approach for optical character recognition (OCR) on acceleration and to avoid underfitting by text. Previously proposed OCR models typically take much time in the training phase and require large amount of labelled data to avoid underfitting. In contrast, our method does not require such condition. This is a challenging task related to transferring the character sequential relationship from text to OCR. We build a model based on transductive transfer learning to achieve domain adaptation from text to image. We thoroughly evaluate our approach on different datasets, including a general one and a relatively small one. We also compare the performance of our model with the general OCR model on different circumstances. We show that (1) our approach accelerates the training phase 20-30% on time cost; and (2) our approach can avoid underfitting while model is trained on a small dataset.


Few Shot Transfer Learning BetweenWord Relatedness and Similarity Tasks Using A Gated Recurrent Siamese Network

AAAI Conferences

Word similarity and word relatedness are fundamental to natural language processing and more generally, understanding how humans relate concepts in semantic memory. A growing number of datasets are being proposed as evaluation benchmarks,however, the heterogeneity and focus of each respective dataset makes it difficult to draw plausible conclusions as to how a unified semantic model would perform. Additionally, we want to identify the transferability of knowledge obtained from one task to another, within the same domain and across domains. Hence, this paper first presents an evaluation and comparison of eight chosen datasets tested using the best performing regression models. As a baseline, we present regression models that incorporate both lexical featuresand word embeddings to produce consistent and competitive results compared to the state of the art.We present our main contribution, the best performing model across seven of the eight datasets - a Gated Recurrent Siamese Networkthat learns relationships between lexical word definitions.A parameter transfer learning strategy is employed for theSiamese Network. Subsequently, we present a secondary contribution which is the best performing non-sequential model:an Inductive and Transductive Transfer Learning strategy fortransferring decision trees within a Random Forest to a target task that is learned from only few instances. The method involves measuring semantic distance between hidden factored matrix representations of decision tree traversal matrices.


Distant-Supervision of Heterogeneous Multitask Learning for Social Event Forecasting With Multilingual Indicators

AAAI Conferences

Open-source indicators such as social media can be very effective precursors for forecasting future societal events. As events are often preceded by social indicators generated by groups of people speaking many different languages, multiple languages need to be considered to ensure comprehensive event forecasting. However, this leads to several technical challenges for traditional models: 1) high dimension, sparsity, and redundancy of features; 2) translation correlation among the multilingual features. and 3) lack of language-wise supervision. In order to simultaneously address these issues, we present a novel model capable of distant-supervision of heterogeneous multitask learning (DHML) for multilingual spatial social event forecasting. This model maps the multilingual heterogeneous features into several latent semantic spaces and then enforces a similar sparsity pattern across them all, using distant supervision across all the languages involved. Optimizing this model creates a difficult problem that is nonconvex and nonsmooth that can then be decomposed into simpler subproblems using the Alternative Direction Multiplier of Methods (ADMM). A novel dynamic programming-based algorithm is proposed to solve one challenging subproblem efficiently. Theoretical properties of the proposed algorithm are analyzed. The results of extensive experiments on multiple real-world datasets are presented to demonstrate the effectiveness, efficiency, and interpretability of the proposed approach.


Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception

AAAI Conferences

This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to choose a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents' local summaries (i.e., based on their current choices of support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanismfor a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on real-world datasets show that our algorithms outperform the state-of-the-art methods.


Batchwise Patching of Classifiers

AAAI Conferences

In this work we present classifier patching, an approach for adapting an existing black-box classification model to new data. Instead of creating a new model, patching infers regions in the instance space where the existing model is error-prone by training a classifier on the previously misclassified data. It then learns a specific model to determine the error regions, which allows to patch the old model’s predictions for them. Patching relies on a strong, albeit unchangeable, existing base classifier, and the idea that the true labels of seen instances will be available in batches at some point in time after the original classification. We experimentally evaluate our approach, and show that it meets the original design goals. Moreover, we compare our approach to existing methods from the domain of ensemble stream classification in both concept drift and transfer learning situations. Patching adapts quickly and achieves high classification accuracy, outperforming state-of-the-art competitors in either adaptation speed or accuracy in many scenarios.


Transfer learning: leveraging insights from large data sets

#artificialintelligence

In this blog post, you'll learn what transfer learning is, what some of its applications are and why it is critical skill as a data scientist. Transfer learning is not a machine learning model or technique; it is rather a'design methodology' within machine learning. Another type of'design methodology' is, for example, active learning. This blog post is the first in a series on transfer learning. You can find the second blog post, which discusses two applications of transfer learning here.


AFT*: Integrating Active Learning and Transfer Learning to Reduce Annotation Efforts

arXiv.org Machine Learning

The splendid success of convolutional neural networks (CNNs) in computer vision is largely attributed to the availability of large annotated datasets, such as ImageNet and Places. However, in biomedical imaging, it is very challenging to create such large annotated datasets, as annotating biomedical images is not only tedious, laborious, and time consuming, but also demanding of costly, specialty-oriented skills, which are not easily accessible. To dramatically reduce annotation cost, this paper presents a novel method to naturally integrate active learning and transfer learning (fine-tuning) into a single framework, called AFT*, which starts directly with a pre-trained CNN to seek "worthy" samples for annotation and gradually enhance the (fine-tuned) CNN via continuous fine-tuning. We have evaluated our method in three distinct biomedical imaging applications, demonstrating that it can cut the annotation cost by at least half, in comparison with the state-of-the-art method. This performance is attributed to the several advantages derived from the advanced active, continuous learning capability of our method. Although AFT* was initially conceived in the context of computer-aided diagnosis in biomedical imaging, it is generic and applicable to many tasks in computer vision and image analysis; we illustrate the key ideas behind AFT* with the Places database for scene interpretation in natural images.


Transfer Learning using differential learning rates

@machinelearnbot

In this post, I will be sharing how one can use popular deep learning models for their own specific task using transfer learning. We will cover some concepts like differential learning rates which are not even currently in implementation in some of the deep learning libraries. I have learned about these from the fast.ai This course content will be available to the general public early 2018 as a MOOC. It is the process of using the knowledge learned in one process/activity and applying it to a different task. Let us take a small example, a player who is good at carroms can apply that knowledge in learning how to play a game of pool.