Goto

Collaborating Authors

 Transfer Learning




Learning with Preserving for Continual Multitask Learning

Wang, Hanchen David, Bae, Siwoo, Chen, Zirong, Ma, Meiyi

arXiv.org Artificial Intelligence

Artificial intelligence systems in critical fields like autonomous driving and medical imaging analysis often continually learn new tasks using a shared stream of input data. For instance, after learning to detect traffic signs, a model may later need to learn to classify traffic lights or different types of vehicles using the same camera feed. This scenario introduces a challenging setting we term Continual Multitask Learning (CMTL), where a model sequentially learns new tasks on an underlying data distribution without forgetting previously learned abilities. Existing continual learning methods often fail in this setting because they learn fragmented, task-specific features that interfere with one another. To address this, we introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the geometric structure of the shared representation space. The core of LwP is a Dynamically Weighted Distance Preservation (DWDP) loss that prevents representation drift by regularizing the pairwise distances between latent data representations. This mechanism of preserving the underlying geometric structure allows the model to retain implicit knowledge and support diverse tasks without requiring a replay buffer, making it suitable for privacy-conscious applications. Extensive evaluations on time-series and image benchmarks show that LwP not only mitigates catastrophic forgetting but also consistently outperforms state-of-the-art baselines in CMTL tasks. Notably, our method shows superior robustness to distribution shifts and is the only approach to surpass the strong single-task learning baseline, underscoring its effectiveness for real-world dynamic environments.


Pre - Trained Model Sets for Target Tasks Target Tasks: Model Universe

Neural Information Processing Systems

We study model reusability evaluation (MRE) for source pre-trained models: evaluating their transfer learning performance to new target tasks. In special, we focus on the setting under which the target training datasets are small, making it difficult to produce reliable MRE scores using them. Under this situation, we propose synergistic learning for building the task-model metric, which can be realized by collecting a set of pre-trained models and asking a group of data providers to participate. We provide theoretical guarantees to show that the learned task-model metric distances can serve as trustworthy MRE scores, and propose synergistic learning algorithms and models for general learning tasks. Experiments show that the MRE models learned by synergistic learning can generate significantly more reliable MRE scores than existing approaches for small-data transfer learning.



Appendix: On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources

Neural Information Processing Systems

Appendix C contains the proof for trade-off theorem discussed in Section 2.4 of our main In (4), Fubini theorem is invoked to swap the integral signs. This bound is novel since it relates loss on input space and data shift on feature space. Lemma 4. Given a source mixture and a target domain, we have the following d Now we are ready to prove the bound which motivate compressed DI representation. The objective function in Eq. (6) can be viewed as training the optimal hypothesis This estimation is unbiased estimation, i.e., To make it more consistent, we discuss under which condition the Hellinger loss is in the family defined in (10). Model We train a hypothesis ˆ f = ˆ h g and minimize the classification loss w.r.t.





Transfer learning with a deeper backbone: We have now tested transfer learning with an adversarially trained

Neural Information Processing Systems

We thank the reviewers for their insightful comments. A number of additional experiments were suggested. We have updated our paper to include these results. For example, our adversarially queried R2-D2 model achieves 35.53% robust 5-shot Out-of-distribution testing: We have now evaluated our models on Meta-Dataset. For the same models, on the CUB-200 dataset, the robust accuracies are 29 .04%