AITopics | Transfer Learning

Collaborating Authors

Transfer Learning

Transfer Learning is the reuse of a pre-trained model on a new problem. (Towards Data Science)

News Overviews Instructional Materials AI-Alerts Classics

Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning

Schulte, David, Hamborg, Felix, Akbik, Alan

arXiv.org Artificial IntelligenceOct-19-2024

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance strongly. But which task to choose for transfer learning? Prior methods producing useful task rankings are infeasible for large source pools, as they require forward passes through all source language models. We overcome this by introducing Embedding Space Maps (ESMs), light-weight neural networks that approximate the effect of fine-tuning a language model. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. We find that applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively, while retaining high selection performance (avg. regret@5 score of 2.95).

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.15148

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany (0.04)
North America > United States > Oregon (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.83)

Add feedback

Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Wang, Xinming, Mak, Simon, Miller, John, Wu, Jianguo

arXiv.org Machine LearningOct-16-2024

A critical bottleneck for scientific progress is the costly nature of computer simulations for complex systems. Surrogate models provide an appealing solution: such models are trained on simulator evaluations, then used to emulate and quantify uncertainty on the expensive simulator at unexplored inputs. In many applications, one often has available data on related systems. For example, in designing a new jet turbine, there may be existing studies on turbines with similar configurations. A key question is how information from such "source" systems can be transferred for effective surrogate training on the "target" system of interest. We thus propose a new LOcal transfer Learning Gaussian Process (LOL-GP) model, which leverages a carefully-designed Gaussian process to transfer such information for surrogate modeling. The key novelty of the LOL-GP is a latent regularization model, which identifies regions where transfer should be performed and regions where it should be avoided. This "local transfer" property is desirable in scientific systems: at certain parameters, such systems may behave similarly and thus transfer is beneficial; at other parameters, they may behave differently and thus transfer is detrimental. By accounting for local transfer, the LOL-GP can rectify a critical limitation of "negative transfer" in existing transfer learning models, where the transfer of information worsens predictive performance. We derive a Gibbs sampling algorithm for efficient posterior predictive sampling on the LOL-GP, for both the multi-source and multi-fidelity transfer settings. We then show, via a suite of numerical experiments and an application for jet turbine design, the improved surrogate performance of the LOL-GP over existing methods.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2410.1269

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.67)
Aerospace & Defense (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.84)
(3 more...)

Add feedback

Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework

Yang, Zhengwei, Li, Yuke, Sun, Qiang, Fernando, Basura, Huang, Heng, Wang, Zheng

arXiv.org Artificial IntelligenceOct-14-2024

Most existing studies on few-shot learning focus on unimodal settings, where models are trained to generalize on unseen data using only a small number of labeled examples from the same modality. However, real-world data are inherently multi-modal, and unimodal approaches limit the practical applications of few-shot learning. To address this gap, this paper introduces the Cross-modal Few-Shot Learning (CFSL) task, which aims to recognize instances from multiple modalities when only a few labeled examples are available. This task presents additional challenges compared to classical few-shot learning due to the distinct visual characteristics and structural properties unique to each modality. To tackle these challenges, we propose a Generative Transfer Learning (GTL) framework consisting of two stages: the first stage involves training on abundant unimodal data, and the second stage focuses on transfer learning to adapt to novel data. Our GTL framework jointly estimates the latent shared concept across modalities and in-modality disturbance in both stages, while freezing the generative module during the transfer phase to maintain the stability of the learned representations and prevent overfitting to the limited multi-modal samples. Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets: Sketchy, TU-Berlin, Mask1K, and SKSF-A. Additionally, the results suggest that the model can estimate latent concepts from vast unimodal data and generalize these concepts to unseen modalities using only a limited number of available samples, much like human cognitive processes.

artificial intelligence, machine learning, modality, (15 more...)

arXiv.org Artificial Intelligence

2410.10663

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.90)

Add feedback

Private Synthetic Data for Multitask Learning and Marginal Queries

Neural Information Processing SystemsOct-11-2024, 15:22:20 GMT

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data.

multitask learning and marginal query, numerical feature, private synthetic data, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Add feedback

Learning-to-learn non-convex piecewise-Lipschitz functions

Neural Information Processing SystemsOct-11-2024, 10:57:30 GMT

We analyze the meta-learning of the initialization and step-size of learning algorithms for piecewise-Lipschitz functions, a non-convex setting with applications to both machine learning and algorithms. Starting from recent regret bounds for the exponential forecaster on losses with dispersed discontinuities, we generalize them to be initialization-dependent and then use this result to propose a practical meta-learning procedure that learns both the initialization and the step-size of the algorithm from multiple online learning tasks. Asymptotically, we guarantee that the average regret across tasks scales with a natural notion of task-similarity that measures the amount of overlap between near-optimal regions of different tasks.

algorithm, learning-to-learn non-convex piecewise-lipschitz function, piecewise-lipschitz function

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Add feedback

Online Multitask Learning with Long-Term Memory

Neural Information Processing SystemsOct-11-2024, 09:39:29 GMT

We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments.

algorithm, hypothesis class, online multitask learning, (1 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Add feedback

Co-Tuning for Transfer Learning

Neural Information Processing SystemsOct-11-2024, 08:21:57 GMT

Fine-tuning pre-trained deep neural networks (DNNs) to a target dataset, also known as transfer learning, is widely used in computer vision and NLP. Because task-specific layers mainly contain categorical information and categories vary with datasets, practitioners only \textit{partially} transfer pre-trained models by discarding task-specific layers and fine-tuning bottom layers. However, it is a reckless loss to simply discard task-specific parameters who take up as many as 20\% of the total parameters in pre-trained models. To \textit{fully} transfer pre-trained models, we propose a two-step framework named \textbf{Co-Tuning}: (i) learn the relationship between source categories and target categories from the pre-trained model and calibrated predictions; (ii) target labels (one-hot labels), as well as source labels (probabilistic labels) translated by the category relationship, collaboratively supervise the fine-tuning process. A simple instantiation of the framework shows strong empirical results in four visual classification tasks and one NLP classification task, bringing up to 20\% relative improvement. While state-of-the-art fine-tuning techniques mainly focus on how to impose regularization when data are not abundant, Co-Tuning works not only in medium-scale datasets (100 samples per class) but also in large-scale datasets (1000 samples per class) where regularization-based methods bring no gains over the vanilla fine-tuning.

co-tuning, dataset, pre-trained model, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Transfer Learning with Affine Model Transformation

Neural Information Processing SystemsOct-11-2024, 05:27:32 GMT

Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally and intuitively plausible methods have achieved great success in a wide range of real-world applications, the lack of a theoretical basis hinders further methodological development. This paper presents a general class of transfer learning regression called affine model transfer, following the principle of expected-square loss minimization. It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors.

affine model transfer, affine model transformation, transfer learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.90)

Add feedback

Transfusion: Understanding Transfer Learning for Medical Imaging

Neural Information Processing SystemsOct-11-2024, 05:01:12 GMT

Transfer learning from natural image datasets, particularly ImageNet, using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental differences in data sizes, features and task specifications between natural image classification and the target medical tasks, and there is little understanding of the effects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer offers little benefit to performance, and simple, lightweight models can perform comparably to ImageNet architectures. Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse.

medical imaging, transfer learning, transfusion

Neural Information Processing Systems

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Shared Space Transfer Learning for analyzing multi-site fMRI data

Neural Information Processing SystemsOct-11-2024, 04:39:42 GMT

Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data, for distinguishing when subjects are performing different cognitive tasks -- e.g., watching movies or making decisions. MVPA works best with a well-designed feature set and an adequate sample size. However, most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. Further, training a robust, generalized predictive model that can analyze homogeneous cognitive tasks provided by multi-site fMRI datasets has additional challenges. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning (TL) approach that can functionally align homogeneous multi-site fMRI datasets, and so improve the prediction performance in every site. It then uses TL to map these site-specific features to a site-independent shared space in order to improve the performance of the MVPA.

fmri dataset, multi-site fmri data, shared space transfer learning, (6 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.88)

Add feedback