Collaborating Authors

multi-task learning

MLR: A Two-stage Conversational Query Rewriting Model with Multi-task Learning Artificial Intelligence

Conversational context understanding aims to recognize the real intention of user from the conversation history, which is critical for building the dialogue system. However, the multi-turn conversation understanding in open domain is still quite challenging, which requires the system extracting the important information and resolving the dependencies in contexts among a variety of open topics. In this paper, we propose the conversational query rewriting model - MLR, which is a Multi-task model on sequence Labeling and query Rewriting. MLR reformulates the multi-turn conversational queries into a single turn query, which conveys the true intention of users concisely and alleviates the difficulty of the multi-turn dialogue modeling. In the model, we formulate the query rewriting as a sequence generation problem and introduce word category information via the auxiliary word category label predicting task. To train our model, we construct a new Chinese query rewriting dataset and conduct experiments on it. The experimental results show that our model outperforms compared models, and prove the effectiveness of the word category information in improving the rewriting performance.

Distributed Primal-Dual Optimization for Online Multi-Task Learning Machine Learning

Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning Machine Learning

We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwise features sharing/fusing scheme across them. This enables us to design a novel and general task-agnostic search space, which inserts cross-task edges (i.e., feature fusion connections) into fixed single-task network backbones. Moreover, we also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures and the final evaluation architecture. This is realized with a minimum entropy regularization on the architecture weights during the search phase, which makes the architecture weights converge to near-discrete values and therefore achieves a single model. As a result, our searched model can be directly used for evaluation without (re-)training from scratch. We perform extensive experiments using different single-task backbones on various task sets, demonstrating the promising performance obtained by exploiting the hierarchical and layerwise features, as well as the desirable generalizability to different i) task sets and ii) single-task backbones. The code of our paper is available at

Multi-Task Reinforcement Learning with Soft Modularization Artificial Intelligence

Multi-task learning is a very challenging problem in reinforcement learning. While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It is unclear what parameters in the network should be reused across tasks, and the gradients from different tasks may interfere with each other. Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue. Given a base policy network, we design a routing network which estimates different routing strategies to reconfigure the base network for each task. Instead of creating a concrete route for each task, our task-specific policy is represented by a soft combination of all possible routes. We name this approach soft modularization. We experiment with multiple robotics manipulation tasks in simulation and show our method improves sample efficiency and performance over baselines by a large margin.

Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent Machine Learning

In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed architectures often share everything but a last task-specific layer. In many cases, this simplistic approach severely limits performance. Instead, we propose an algorithm to learn the assignment between a shared set of weights and task-specific layers. To optimize the non-differentiable assignment and at the same time train the differentiable weights, learning takes place via a combination of natural evolution strategy and stochastic gradient descent. The end result are task-specific networks that share weights but allow independent inference. They achieve lower test errors than baselines and methods from literature on three multi-task learning datasets.

Multi-task Learning for Aggregated Data using Gaussian Processes

Neural Information Processing Systems

Aggregated data is commonplace in areas such as epidemiology and demography. For example, census data for a population is usually given as averages defined over time periods or spatial resolutions (cities, regions or countries). In this paper, we present a novel multi-task learning model based on Gaussian processes for joint learning of variables that have been aggregated at different input scales. Our model represents each task as the linear combination of the realizations of latent processes that are integrated at a different scale per task. We are then able to compute the cross-covariance between the different tasks either analytically or numerically.

Pareto Multi-Task Learning

Neural Information Processing Systems

Multi-task learning is a powerful method for solving multiple correlated tasks simultaneously. However, it is often impossible to find one single solution to optimize all the tasks, since different tasks might conflict with each other. Recently, a novel method is proposed to find one single Pareto optimal solution with good trade-off among different tasks by casting multi-task learning as multiobjective optimization. In this paper, we generalize this idea and propose a novel Pareto multi-task learning algorithm (Pareto MTL) to find a set of well-distributed Pareto solutions which can represent different trade-offs among different tasks. The proposed algorithm first formulates a multi-task learning problem as a multiobjective optimization problem, and then decomposes the multiobjective optimization problem into a set of constrained subproblems with different trade-off preferences.

Survey of Personalization Techniques for Federated Learning Machine Learning

Federated learning enables machine learning models to learn from private decentralized data without compromising privacy. The standard formulation of federated learning produces one shared model for all clients. Statistical heterogeneity due to non-IID distribution of data across devices often leads to scenarios where, for some clients, the local models trained solely on their private data perform better than the global shared model thus taking away their incentive to participate in the process. Several techniques have been proposed to personalize global models to work better for individual clients. This paper highlights the need for personalization and surveys recent research on this topic.

Interpretable MTL from Heterogeneous Domains using Boosted Tree Machine Learning

Multi-task learning (MTL) aims at improving the generalization performance of several related tasks by leveraging useful information contained in them. However, in industrial scenarios, interpretability is always demanded, and the data of different tasks may be in heterogeneous domains, making the existing methods unsuitable or unsatisfactory. In this paper, following the philosophy of boosted tree, we proposed a two-stage method. In stage one, a common model is built to learn the commonalities using the common features of all instances. Different from the training of conventional boosted tree model, we proposed a regularization strategy and an early-stopping mechanism to optimize the multi-task learning process. In stage two, started by fitting the residual error of the common model, a specific model is constructed with the task-specific instances to further boost the performance. Experiments on both benchmark and real-world datasets validate the effectiveness of the proposed method. What's more, interpretability can be naturally obtained from the tree based method, satisfying the industrial needs.

Unified Multi-Domain Learning and Data Imputation using Adversarial Autoencoder Machine Learning

We present a novel framework that can combine multi-domain learning (MDL), data imputation (DI) and multi-task learning (MTL) to improve performance for classification and regression tasks in different domains. The core of our method is an adversarial autoencoder that can: (1) learn to produce domain-invariant embeddings to reduce the difference between domains; (2) learn the data distribution for each domain and correctly perform data imputation on missing data. For MDL, we use the Maximum Mean Discrepancy (MMD) measure to align the domain distributions. For DI, we use an adversarial approach where a generator fill in information for missing data and a discriminator tries to distinguish between real and imputed values. Finally, using the universal feature representation in the embeddings, we train a classifier using MTL that given input from any domain, can predict labels for all domains. We demonstrate the superior performance of our approach compared to other state-of-art methods in three distinct settings, DG-DI in image recognition with unstructured data, MTL-DI in grade estimation with structured data and MDMTL-DI in a selection process using mixed data.