Transfer Learning
Transfer Learning for Security: Challenges and Future Directions
Li, Adrian Shuai, Iyengar, Arun, Kundu, Ashish, Bertino, Elisa
Many machine learning and data mining algorithms rely on the assumption that the training and testing data share the same feature space and distribution. However, this assumption may not always hold. For instance, there are situations where we need to classify data in one domain, but we only have sufficient training data available from a different domain. The latter data may follow a distinct distribution. In such cases, successfully transferring knowledge across domains can significantly improve learning performance and reduce the need for extensive data labeling efforts. Transfer learning (TL) has thus emerged as a promising framework to tackle this challenge, particularly in security-related tasks. This paper aims to review the current advancements in utilizing TL techniques for security. The paper includes a discussion of the existing research gaps in applying TL in the security domain, as well as exploring potential future research directions and issues that arise in the context of TL-assisted security solutions.
A Regularization-based Transfer Learning Method for Information Extraction via Instructed Graph Decoder
Chen, Kedi, Zhou, Jie, Chen, Qin, Liu, Shunyu, He, Liang
Information extraction (IE) aims to extract complex structured information from the text. Numerous datasets have been constructed for various IE tasks, leading to time-consuming and labor-intensive data annotations. Nevertheless, most prevailing methods focus on training task-specific models, while the common knowledge among different IE tasks is not explicitly modeled. Moreover, the same phrase may have inconsistent labels in different tasks, which poses a big challenge for knowledge transfer using a unified model. In this study, we propose a regularization-based transfer learning method for IE (TIE) via an instructed graph decoder. Specifically, we first construct an instruction pool for datasets from all well-known IE tasks, and then present an instructed graph decoder, which decodes various complex structures into a graph uniformly based on corresponding instructions. In this way, the common knowledge shared with existing datasets can be learned and transferred to a new dataset with new labels. Furthermore, to alleviate the label inconsistency problem among various IE tasks, we introduce a task-specific regularization strategy, which does not update the gradients of two tasks with'opposite direction'. We conduct extensive experiments on 12 datasets spanning four IE tasks, and the results demonstrate the great advantages of our proposed method.
Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays
Sedgwick, Ruby, Goertz, John P., Stevens, Molly M., Misener, Ruth, van der Wilk, Mark
With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this development feasible. By combining a transfer learning surrogate model with Bayesian optimization, we show how the total number of experiments can be reduced by sharing information between optimization tasks. We demonstrate the reduction in the number of experiments using data from the development of DNA competitors for use in an amplification-based diagnostic assay. We use cross-validation to compare the predictive accuracy of different transfer learning models, and then compare the performance of the models for both single objective and penalized optimization tasks.
Adversarial-Robust Transfer Learning for Medical Imaging via Domain Assimilation
In the field of Medical Imaging, extensive research has been dedicated to leveraging its potential in uncovering critical diagnostic features in patients. Artificial Intelligence (AI)-driven medical diagnosis relies on sophisticated machine learning and deep learning models to analyze, detect, and identify diseases from medical images. Despite the remarkable performance of these models, characterized by high accuracy, they grapple with trustworthiness issues. The introduction of a subtle perturbation to the original image empowers adversaries to manipulate the prediction output, redirecting it to other targeted or untargeted classes. Furthermore, the scarcity of publicly available medical images, constituting a bottleneck for reliable training, has led contemporary algorithms to depend on pretrained models grounded on a large set of natural images -- a practice referred to as transfer learning. However, a significant {\em domain discrepancy} exists between natural and medical images, which causes AI models resulting from transfer learning to exhibit heightened {\em vulnerability} to adversarial attacks. This paper proposes a {\em domain assimilation} approach that introduces texture and color adaptation into transfer learning, followed by a texture preservation component to suppress undesired distortion. We systematically analyze the performance of transfer learning in the face of various adversarial attacks under different data modalities, with the overarching goal of fortifying the model's robustness and security in medical imaging tasks. The results demonstrate high effectiveness in reducing attack efficacy, contributing toward more trustworthy transfer learning in biomedical applications.
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion
Long, Zijun, Killick, George, Zhuang, Lipeng, Aragon-Camarasa, Gerardo, Meng, Zaiqiao, Mccreadie, Richard
State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks, achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in transfer learning settings with the BEiT-3 model. Importantly, our proposed CLCE approach effectively mitigates the dependency of contrastive learning on large batch sizes such as 4096 samples per batch, a limitation that has previously constrained the application of contrastive learning in budget-limited hardware environments.
Smoothness Adaptive Hypothesis Transfer Learning
Lin, Haotian, Reimherr, Matthew
Many existing two-phase kernel-based hypothesis transfer learning algorithms employ the same kernel regularization across phases and rely on the known smoothness of functions to obtain optimality. Therefore, they fail to adapt to the varying and unknown smoothness between the target/source and their offset in practice. In this paper, we address these problems by proposing Smoothness Adaptive Transfer Learning (SATL), a two-phase kernel ridge regression(KRR)-based algorithm. We first prove that employing the misspecified fixed bandwidth Gaussian kernel in target-only KRR learning can achieve minimax optimality and derive an adaptive procedure to the unknown Sobolev smoothness. Leveraging these results, SATL employs Gaussian kernels in both phases so that the estimators can adapt to the unknown smoothness of the target/source and their offset function. We derive the minimax lower bound of the learning problem in excess risk and show that SATL enjoys a matching upper bound up to a logarithmic factor. The minimax convergence rate sheds light on the factors influencing transfer dynamics and demonstrates the superiority of SATL compared to non-transfer learning settings. While our main objective is a theoretical analysis, we also conduct several experiments to confirm our results.
Predicting trucking accidents with truck drivers 'safety climate perception across companies: A transfer learning approach
Sun, Kailai, Lan, Tianxiang, Kam, Say Hong, Goh, Yang Miang, Huang, Yueng-Hsiang
There is a rising interest in using artificial intelligence (AI)-powered safety analytics to predict accidents in the trucking industry. Companies may face the practical challenge, however, of not having enough data to develop good safety analytics models. Although pretrained models may offer a solution for such companies, existing safety research using transfer learning has mostly focused on computer vision and natural language processing, rather than accident analytics. To fill the above gap, we propose a pretrain-then-fine-tune transfer learning approach to help any company leverage other companies' data to develop AI models for a more accurate prediction of accident risk. We also develop SafeNet, a deep neural network algorithm for classification tasks suitable for accident prediction. Using the safety climate survey data from seven trucking companies with different data sizes, we show that our proposed approach results in better model performance compared to training the model from scratch using only the target company's data. We also show that for the transfer learning model to be effective, the pretrained model should be developed with larger datasets from diverse sources. The trucking industry may, thus, consider pooling safety analytics data from a wide range of companies to develop pretrained models and share them within the industry for better knowledge and resource transfer. The above contributions point to the promise of advanced safety analytics to make the industry safer and more sustainable.
BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation
Lee, Daeun, Yoon, Jaehong, Hwang, Sung Ju
Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, forgetting-adaptation trade-offs and efficiency are still unexplored. Moreover, current CTTA scenarios assume only the disjoint situation, even though real-world domains are seamlessly changed. To tackle these challenges, this paper proposes BECoTTA, an input-dependent yet efficient framework for CTTA. We propose Mixture-of-Domain Low-rank Experts (MoDE) that contains two core components: (i) Domain-Adaptive Routing, which aids in selectively capturing the domain-adaptive knowledge with multiple domain routers, and (ii) Domain-Expert Synergy Loss to maximize the dependency between each domain and expert. We validate our method outperforms multiple CTTA scenarios including disjoint and gradual domain shits, while only requiring ~98% fewer trainable parameters. We also provide analyses of our method, including the construction of experts, the effect of domain-adaptive experts, and visualizations.
Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning
Lee, Haeju, Jeong, Minchan, Yun, Se-Young, Kim, Kee-Eung
Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving a high degree of parameter efficiency.
Transfer learning with generative models for object detection on limited datasets
Paiano, Matteo, Martina, Stefano, Giannelli, Carlotta, Caruso, Filippo
The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets, and is not trained on the task-specific domain. We validate our approach on object detection tasks, specifically focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.