Goto

Collaborating Authors

 Transfer Learning


Sparse coding for multitask and transfer learning

arXiv.org Machine Learning

We investigate the use of sparse coding and dictionary learning in the context of multitask and transfer learning. The central assumption of our learning method is that the tasks parameters are well approximated by sparse linear combinations of the atoms of a dictionary on a high or infinite dimensional space. This assumption, together with the large quantity of available data in the multitask and transfer learning settings, allows a principled choice of the dictionary. We provide bounds on the generalization error of this approach, for both settings. Numerical experiments on one synthetic and two real datasets show the advantage of our method over single task learning, a previous method based on orthogonal and dense representation of the tasks and a related method learning task grouping.


An Inequality with Applications to Structured Sparsity and Multitask Dictionary Learning

arXiv.org Machine Learning

From concentration inequalities for the suprema of Gaussian or Rademacher processes an inequality is derived. It is applied to sharpen existing and to derive novel bounds on the empirical Rademacher complexities of unit balls in various norms appearing in the context of structured sparsity and multitask dictionary learning or matrix factorization. A key role is played by the largest eigenvalue of the data covariance matrix. Keywords: Concentration inequalities, multitask learning, Rademacher complexity, risk bounds, structured sparsity.


A PAC-Bayesian bound for Lifelong Learning

arXiv.org Machine Learning

Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.


Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

Neural Information Processing Systems

Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called {\em Sparse Overlapping Sets (SOS) lasso}, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.


Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

arXiv.org Machine Learning

Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi- subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.


Multi-View Discriminant Transfer Learning

AAAI Conferences

We study to incorporate multiple views of data in a perceptive transfer learning framework and propose a Multi-view Discriminant Transfer (MDT) learning approach for domain adaptation. The main idea is to find the optimal discriminant weight vectors for each view such that the correlation between the two-view projected data is maximized, while both the domain discrepancy and the view disagreement are minimized simultaneously. Furthermore, we analyze MDT theoretically from discriminant analysis perspective to explain the condition and reason, under which the proposed method is not applicable. The analytical results allow us to investigate whether there exist within-view and/or between-view conflicts, and thus provides a deep insight into whether the transfer learning algorithm work properly or not in the view-based problems and the combined learning problem. Experiments show that MDT significantly outperforms the state-of-the-art baselines including some typical multi-view learning approaches in single- or cross-domain.


Active Transfer Learning for Cross-System Recommendation

AAAI Conferences

Recommender systems, especially the newly launched ones, have to deal with the data-sparsity issue, where little existing rating information is available. Recently, transfer learning has been proposed to address this problem by leveraging the knowledge from related recommender systems where rich collaborative data are available. However, most previous transfer learning models assume that entity-correspondences across different systems are given as input, which means that for any entity (e.g., a user or an item) in a target system, its corresponding entity in a source system is known. This assumption can hardly be satisfied in real-world scenarios where entity-correspondences across systems are usually unknown, and the cost of identifying them can be expensive. For example, it is extremely difficult to identify whether a user A from Facebook and a user B from Twitter are the same person. In this paper, we propose a framework to construct entity correspondence with limited budget by using active learning to facilitate knowledge transfer across recommender systems. Specifically, for the purpose of maximizing knowledge transfer, we first iteratively select entities in the target system based on our proposed criterion to query their correspondences in the source system. We then plug the actively constructed entity-correspondence mapping into a general transferred collaborative-filtering model to improve recommendation quality. We perform extensive experiments on real world datasets to verify the effectiveness of our proposed framework for this cross-system recommendation problem.


A Transfer Learning Approach for Learning Temporal Nodes Bayesian Networks

AAAI Conferences

Situations where there is insufficient information to learn from often arise, and the process to recollect data can be expensive or in some cases take too long resulting in outdated models. Transfer learning strategies have proven to be a powerful technique to learn models from several sources when a single source does not provide enough information. In this work we present a methodology to learn a Temporal Nodes Bayesian Network by transferring knowledge from several different but related domains. Experiments based on a reference network show promising results, supporting our claim that transfer learning is a viable strategy to learn these models when scarce data is available.


Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)

AAAI Conferences

When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped so that it assists learning. The majority of the existing approaches use pre-defined mappings given by a domain expert. To overcome this limitations and allow autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings, named COMBREL. Experimental results show that the use of multiple inter-task mappings, accompanied with a selection mechanism, can significantly boost the performance of transfer learning, relative to learning without transfer and relative to using a single hand-picked mapping.


Excess risk bounds for multitask learning with trace norm regularization

arXiv.org Machine Learning

A fundamental limitation of supervised learning is the cost incurred by the preparation of the large training samples required for good generalization. A potential remedy is offered by multi-task learning: in many cases, while individual sample sizes are rather small, there are samples to represent a large number of learning tasks, which share some constraining or generative property. This common property can be estimated using the entire collection of training samples, and if this property is sufficiently simple it should allow better estimation of the individual tasks from small individual samples. The machine learning community has tried multi-task learning for many years (see [3, 4, 12, 13, 14, 20, 21, 26], contributions and references therein), but there are few theoretical investigations which clearly expose the conditions under which multi-task learning is preferable to independent learning. Following the seminal work of Baxter ([7, 8]) several authors have given generalization and 1 performance bounds under different assumptions of task-relatedness. In this paper we consider multi-task learning with trace-norm regularization (TNML), a technique for which efficient algorithms exist and which has been successfully applied many times (see e.g.