Goto

Collaborating Authors

 multitask learning


Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Neural Information Processing Systems

Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel confidence intervals for multitask regression in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d.


Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

Neural Information Processing Systems

Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called {\em Sparse Overlapping Sets (SOS) lasso}, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multi-subject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso.


Multitask learning meets tensor factorization: task imputation via convex optimization

Neural Information Processing Systems

We study a multitask learning problem in which each task is parametrized by a weight vector and indexed by a pair of indices, which can be e.g, (consumer, time). The weight vectors can be collected into a tensor and the (multilinear-)rank of the tensor controls the amount of sharing of information among tasks. Two types of convex relaxations have recently been proposed for the tensor multilinear rank. However, we argue that both of them are not optimal in the context of multitask learning in which the dimensions or multilinear rank are typically heterogeneous. We propose a new norm, which we call the scaled latent trace norm and analyze the excess risk of all the three norms. The results apply to various settings including matrix and tensor completion, multitask learning, and multilinear multitask learning. Both the theory and experiments support the advantage of the new norm when the tensor is not equal-sized and we do not a priori know which mode is low rank.


Measuring Fine-Grained Relatedness in Multitask Learning via Data Attribution

arXiv.org Artificial Intelligence

Measuring task relatedness and mitigating negative transfer remain a critical open challenge in Multitask Learning (MTL). This work extends data attribution -- which quantifies the influence of individual training data points on model predictions -- to MTL setting for measuring task relatedness. We propose the MultiTask Influence Function (MTIF), a method that adapts influence functions to MTL models with hard or soft parameter sharing. Compared to conventional task relatedness measurements, MTIF provides a fine-grained, instance-level relatedness measure beyond the entire-task level. This fine-grained relatedness measure enables a data selection strategy to effectively mitigate negative transfer in MTL. Through extensive experiments, we demonstrate that the proposed MTIF efficiently and accurately approximates the performance of models trained on data subsets. Moreover, the data selection strategy enabled by MTIF consistently improves model performance in MTL. Our work establishes a novel connection between data attribution and MTL, offering an efficient and fine-grained solution for measuring task relatedness and enhancing MTL models.


Learning Beyond Limits: Multitask Learning and Synthetic Data for Low-Resource Canonical Morpheme Segmentation

arXiv.org Artificial Intelligence

We introduce a transformer-based morpheme segmentation system that augments a low-resource training signal through multitask learning and LLM-generated synthetic data. Our framework jointly predicts morphological segments and glosses from orthographic input, leveraging shared linguistic representations obtained through a common documentary process to enhance model generalization. To further address data scarcity, we integrate synthetic training data generated by large language models (LLMs) using in-context learning. Experimental results on the SIGMORPHON 2023 dataset show that our approach significantly improves word-level segmentation accuracy and morpheme-level F1-score across multiple low-resource languages.


Multitask learning meets tensor factorization: task imputation via convex optimization

Neural Information Processing Systems

We study a multitask learning problem in which each task is parametrized by a weight vector and indexed by a pair of indices, which can be e.g, (consumer, time). The weight vectors can be collected into a tensor and the (multilinear-)rank of the tensor controls the amount of sharing of information among tasks. Two types of convex relaxations have recently been proposed for the tensor multilinear rank. However, we argue that both of them are not optimal in the context of multitask learning in which the dimensions or multilinear rank are typically heterogeneous. We propose a new norm, which we call the scaled latent trace norm and analyze the excess risk of all the three norms. The results apply to various settings including matrix and tensor completion, multitask learning, and multilinear multitask learning. Both the theory and experiments support the advantage of the new norm when the tensor is not equal-sized and we do not a priori know which mode is low rank.


Privacy in Metalearning and Multitask Learning: Modeling and Separations

arXiv.org Artificial Intelligence

Model personalization allows a set of individuals, each facing a different learning task, to train models that are more accurate for each person than those they could develop individually. For example, consider a set of people, each of whom holds a relatively small dataset of photographs labeled with the names of their loved ones that appear in each picture. Each person would like to build a classifier that labels future pictures with the names of people in the picture, but training such an image classifier would take more data than any individual person has. Even though the tasks they want to carry out are different--their photos have different subjects--those tasks share a lot of common structure. By pooling their data, a large group of people could learn the shared components of a good set of classifiers. Each individual could then train the subject-specific components on their own, requiring only a few examples for each subject. Other applications of personalization include next-word prediction on a mobile keyboard, speech recognition, and recommendation systems. The goals of personalization are captured in a variety of formal frameworks, such as multitask learning and metalearning.


Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Neural Information Processing Systems

Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel confidence intervals for multitask regression in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d. Through a refined analysis of the multitask information gain, we obtain new regret guarantees that, depending on a task similarity parameter, can significantly improve over treating tasks independently.


Reviews: Learning to Multitask

Neural Information Processing Systems

This paper presents an end-to-end framework for multitask learning called L2MT. The L2MT is trained on previous multitask problems, and selects the best multitask model for a new multitask problem. Authors also proposed a effective algorithm for selecting the best model on test data. Pros: - This is an clean and straightforward framework for multitask learning, and it is end-to-end. Cons: - The'label' in this learning task is the relative test error of eps_{MTL}/eps_{STL}. In my understanding, eps_{STL} is roughly measuring the level of difficulty of a multitask problem, but is there a better baseline to use here?


Learning Representation for Multitask learning through Self Supervised Auxiliary learning

arXiv.org Machine Learning

Multi-task learning is a popular machine learning approach that enables simultaneous learning of multiple related tasks, improving algorithmic efficiency and effectiveness. In the hard parameter sharing approach, an encoder shared through multiple tasks generates data representations passed to task-specific predictors. Therefore, it is crucial to have a shared encoder that provides decent representations for every and each task. However, despite recent advances in multi-task learning, the question of how to improve the quality of representations generated by the shared encoder remains open. To address this gap, we propose a novel approach called Dummy Gradient norm Regularization that aims to improve the universality of the representations generated by the shared encoder. Specifically, the method decreases the norm of the gradient of the loss function with repect to dummy task-specific predictors to improve the universality of the shared encoder's representations. Through experiments on multiple multi-task learning benchmark datasets, we demonstrate that DGR effectively improves the quality of the shared representations, leading to better multi-task prediction performances. Applied to various classifiers, the shared representations generated by DGR also show superior performance compared to existing multi-task learning methods. Moreover, our approach takes advantage of computational efficiency due to its simplicity. The simplicity also allows us to seamlessly integrate DGR with the existing multi-task learning algorithms.