Goto

Collaborating Authors

 latent task




Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Zhao, Yukun, Yan, Lingyong, Li, Zhenyang, Wang, Shuaiqiang, Chen, Zhumin, Ren, Zhaochun, Yin, Dawei

arXiv.org Artificial Intelligence

Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We first introduce flashbacks -- a limited number of prompts from old tasks -- when adapting to new tasks and constrain the deviations of the model outputs compared to the original one. We then interpolate latent tasks between flashbacks and new tasks to enable jointly learning relevant latent tasks, new tasks, and flashbacks, alleviating data sparsity in flashbacks and facilitating knowledge sharing for smooth adaptation. Our method requires only a limited number of flashbacks without access to the replay data and is task-agnostic. We conduct extensive experiments on state-of-the-art large language models across 1000+ instruction-following tasks, arithmetic reasoning tasks, and general reasoning tasks. The results demonstrate the superior performance of our method in improving generalization on new tasks and reducing forgetting in old tasks.


Reviews: Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Neural Information Processing Systems

The paper identifies the unsolved problem of meta-Inverse Reinforcement Learning. That is, learning a reward function for an unseen task from a single expert trajectory for that task, using a batch of expert trajectories for different but related tasks as training data (the task being solved by each training expert trajectory is not communicated to the learning algorithm). Because IRL is used rather than imitation learning, a reward function is learned for each task (or rather a single reward function parameterized by the latent variable m which is supposed to capture task). The paper then formulates an framework for training neural networks to solve the identified problem, building off of past work on Adversarial IRL, and adding latent task variables to handle the variation in task. A network q_psi is used to identify the task variable from a demonstration.


Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Shaj, Vaisakh, Buchler, Dieter, Sonker, Rohit, Becker, Philipp, Neumann, Gerhard

arXiv.org Artificial Intelligence

Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.


Modelling the influence of data structure on learning in neural networks

Goldt, Sebastian, Mézard, Marc, Krzakala, Florent, Zdeborová, Lenka

arXiv.org Machine Learning

The lack of crisp mathematical models that capture the structure of real-world data sets is a major obstacle to the detailed theoretical understanding of deep neural networks. Here, we first demonstrate the effect of structured data sets by experimentally comparing the dynamics and the performance of two-layer networks trained on two different data sets: (i) an unstructured synthetic data set containing random i.i.d. inputs, and (ii) a simple canonical data set containing MNIST images. Our analysis reveals two phenomena related to the dynamics of the networks and their ability to generalise that only appear when training on structured data sets. Second, we introduce a generative model for data sets, where high-dimensional inputs lie on a lower-dimensional manifold and have labels that depend only on their position within this manifold. We call it the hidden manifold model and we experimentally demonstrate that training networks on data sets drawn from this model reproduces both the phenomena seen during training on MNIST.


Self-Paced Multi-Task Learning

Li, Changsheng (East China Normal University) | Yan, Junchi (East China Normal University) | Wei, Fan (Stanford University) | Dong, Weishan (IBM Research - China) | Liu, Qingshan (Nanjing University of Information Science and Technology) | Zha, Hongyuan (East China Normal University)

AAAI Conferences

Multi-task learning is a paradigm, where multiple tasks are jointly learnt. Previous multi-task learning models usually treat all tasks and instances per task equally during learning. Inspired by the fact that humans often learn from easy concepts to hard ones in the cognitive process, in this paper, we propose a novel multi-task learning framework that attempts to learn the tasks by simultaneously taking into consideration the complexities of both tasks and instances per task. We propose a novel formulation by presenting a new task-oriented regularizer that can jointly prioritize tasks and instances.Thus it can be interpreted as a self-paced learner for multi-task learning. An efficient block coordinate descent algorithm is developed to solve the proposed objective function, and the convergence of the algorithm can be guaranteed. Experimental results on the toy and real-world datasets demonstrate the effectiveness of the proposed approach, compared to the state-of-the-arts.


Learning Implicit Tasks for Patient-Specific Risk Modeling in ICU

Nori, Nozomi (Kyoto University) | Kashima, Hisashi (Kyoto University) | Yamashita, Kazuto (Kyoto University) | Kunisawa, Susumu (Kyoto University) | Imanaka, Yuichi (Kyoto University)

AAAI Conferences

Accurate assessment of the severity of a patient’s condition plays a fundamental role in acute hospital care such as that provided in an intensive care unit (ICU). ICU clinicians are required to make sense of a large amount of clinical data in a limited time to estimate the severity of a patient’s condition, which ultimately leads to the planning of appropriate care. The ICU is an especially demanding environment for clinicians because of the diversity of patients who mostly suffer from multiple diseases of various types. In this paper, we propose a mortality risk prediction method for ICU patients. The method is intended to enhance the severity assessment by considering the diversity of patients. Our method produces patient-specific risk models that reflect the collection of diseases associated with the patient. Specifically, we assume a small number of latent basis tasks, where each latent task is associated with its own parameter vector; a parameter vector for a specific patient is constructed as a linear combination of these. The latent representation of a patient, namely, the coefficients of the combination, is learned based on the collection of diseases associated with the patient. Our method could be considered a multi-task learning method where latent tasks are learned based on the collection of diseases. We demonstrate the effectiveness of our proposed method using a dataset collected from a hospital. Our method achieved higher predictive performance compared with a single-task learning method, the “de facto standard,” and several multi-task learning methods including a recently proposed method for ICU mortality risk prediction. Furthermore, our proposed method could be used not only for predictions but also for uncovering patient-specificity from different viewpoints.


Learning Task Grouping and Overlap in Multi-task Learning

Kumar, Abhishek, Daume, Hal III

arXiv.org Machine Learning

In the paradigm of multi-task learning, mul- tiple related prediction tasks are learned jointly, sharing information across the tasks. We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combi- nation of a finite number of underlying basis tasks. The coefficients of the linear combina- tion are sparse in nature and the overlap in the sparsity patterns of two tasks controls the amount of sharing across these. Our model is based on on the assumption that task pa- rameters within a group lie in a low dimen- sional subspace but allows the tasks in differ- ent groups to overlap with each other in one or more bases. Experimental results on four datasets show that our approach outperforms competing methods.