multi-task learning

The Future of Machine Learning - KDnuggets


Machine learning is a trendy topic in this age of Artificial Intelligence. The fields of computer vision and Natural Language Processing (NLP) are making breakthroughs that no one could've predicted. We see both of them in our lives more and more, facial recognition in your smartphones, language translation software, self-driving cars and so on. What might seem sci-fi is becoming a reality, and it is only a matter of time before we attain Artificial General Intelligence. In this article, I will be covering Jeff Dean's keynote on the advancements of computer vision and language models and how ML will progress towards the future from the perspective of model building.

Multi-Task Learning via Conic Programming

Neural Information Processing Systems

When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as \emph{uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems.

Infinite Latent SVM for Classification and Multi-task Learning

Neural Information Processing Systems

Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes' theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

A Dirty Model for Multi-task Learning

Neural Information Processing Systems

We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well. Here, we ask the question: can we leverage support and parameter overlap when it exists, but not pay a penalty when it does not? Indeed, this falls under a more general question of whether we can model such \emph{dirty data} which may not fall into a single neat structural bracket (all block-sparse, or all low-rank and so on).

Clustered Multi-Task Learning: A Convex Formulation

Neural Information Processing Systems

In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We show in simulations on synthetic examples and on the iedb MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non convex methods dedicated to the same problem. Papers published at the Neural Information Processing Systems Conference.

Clustered Multi-Task Learning Via Alternating Structure Optimization

Neural Information Processing Systems

Multi-task learning (MTL) learns multiple related tasks simultaneously to improve generalization performance. Alternating structure optimization (ASO) is a popular MTL method that learns a shared low-dimensional predictive structure on hypothesis spaces from multiple related tasks. It has been applied successfully in many real world applications. As an alternative MTL approach, clustered multi-task learning (CMTL) assumes that multiple tasks follow a clustered structure, i.e., tasks are partitioned into a set of groups where tasks in the same group are similar to each other, and that such a clustered structure is unknown a priori. The objectives in ASO and CMTL differ in how multiple tasks are related.

Multi-task Vector Field Learning

Neural Information Processing Systems

Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector fields provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both are crucial for semi-supervised multi-task learning.

Multi-task Learning for Aggregated Data using Gaussian Processes

Neural Information Processing Systems

Aggregated data is commonplace in areas such as epidemiology and demography. For example, census data for a population is usually given as averages defined over time periods or spatial resolutions (cities, regions or countries). In this paper, we present a novel multi-task learning model based on Gaussian processes for joint learning of variables that have been aggregated at different input scales. Our model represents each task as the linear combination of the realizations of latent processes that are integrated at a different scale per task. We are then able to compute the cross-covariance between the different tasks either analytically or numerically.

Learning Feature Selection Dependencies in Multi-task Learning

Neural Information Processing Systems

A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction. Exact inference is intractable in this model. However, expectation propagation offers an approximate alternative. Because the process of estimating feature selection dependencies may suffer from over-fitting in the model proposed, additional data from a multi-task learning scenario are considered for induction. The same model can be used in this setting with few modifications.

Multi-Task Learning for Contextual Bandits

Neural Information Processing Systems

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.