Transfer Learning
Trade-off between reconstruction loss and feature alignment for domain generalization
Nguyen, Thuan, Lyu, Boyang, Ishwar, Prakash, Scheutz, Matthias, Aeron, Shuchin
Domain generalization (DG) is a branch of transfer learning that aims to train the learning models on several seen domains and subsequently apply these pre-trained models to other unseen (unknown but related) domains. To deal with challenging settings in DG where both data and label of the unseen domain are not available at training time, the most common approach is to design the classifiers based on the domain-invariant representation features, i.e., the latent representations that are unchanged and transferable between domains. Contrary to popular belief, we show that designing classifiers based on invariant representation features alone is necessary but insufficient in DG. Our analysis indicates the necessity of imposing a constraint on the reconstruction loss induced by representation functions to preserve most of the relevant information about the label in the latent space. More importantly, we point out the trade-off between minimizing the reconstruction loss and achieving domain alignment in DG. Our theoretical results motivate a new DG framework that jointly optimizes the reconstruction loss and the domain discrepancy. Both theoretical and numerical results are provided to justify our approach.
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Üstün, Ahmet, Bisazza, Arianna, Bouma, Gosse, van Noord, Gertjan, Ruder, Sebastian
Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. This model generates weights for adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge, our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best or competitive gain when a mixture of multiple resources is available, while being on par with strong baselines in the standard scenario. Hyper-X is also considerably more efficient in terms of parameters and resources compared to methods that train separate adapters. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages, showing the versatility of our approach beyond zero-shot transfer.
Transfer Learning of Lexical Semantic Families for Argumentative Discourse Units Identification
Rodrigues, João, Branco, Ruben, Branco, António
Argument mining tasks require an informed range of low to high complexity linguistic phenomena and commonsense knowledge. Previous work has shown that pre-trained language models are highly effective at encoding syntactic and semantic linguistic phenomena when applied with transfer learning techniques and built on different pre-training objectives. It remains an issue of how much the existing pre-trained language models encompass the complexity of argument mining tasks. We rely on experimentation to shed light on how language models obtained from different lexical semantic families leverage the performance of the identification of argumentative discourse units task. Experimental results show that transfer learning techniques are beneficial to the task and that current methods may be insufficient to leverage commonsense knowledge from different lexical semantic families.
Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations in a Label-Abundant Setup
Yordanov, Yordan, Kocijan, Vid, Lukasiewicz, Thomas, Camburu, Oana-Maria
Training a model to provide natural language explanations (NLEs) for its predictions usually requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the few-shot out-of-domain transfer of NLEs from a parent task with many NLEs to a child task. In this work, we examine the setup in which the child task has few NLEs but abundant labels. We establish four few-shot transfer learning methods that cover the possible fine-tuning combinations of the labels and NLEs for the parent and child tasks. We transfer explainability from a large natural language inference dataset (e-SNLI) separately to two child tasks: (1) hard cases of pronoun resolution, where we introduce the small-e-WinoGrande dataset of NLEs on top of the WinoGrande dataset, and (2)~commonsense validation (ComVE). Our results demonstrate that the parent task helps with NLE generation and we establish the best methods for this setup.
On effects of Knowledge Distillation on Transfer Learning
Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In recent years, there has been significant progress in novel distillation techniques that push performance frontiers across multiple problems and benchmarks. Most of the reported work focuses on achieving state-of-the-art results on the specific problem. However, there has been a significant gap in understanding the process and how it behaves under certain training scenarios. Similarly, transfer learning (TL) is an effective technique in training neural networks on a limited dataset faster by reusing representations learned from a different but related problem. Despite its effectiveness and popularity, there has not been much exploration of knowledge distillation on transfer learning. In this thesis, we propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning; we then present a quantitative and qualitative comparison of TL+KD with TL in the domain of image classification. Through this work, we show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy. We characterize the improvement in the validation performance of the model using a variety of metrics beyond just accuracy scores, and study its performance in scenarios such as input degradation.
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Oliveira, Daniel, de Matos, David Martins
The computer vision community has seen a shift from convolutional-based to pure transformer architectures for both image and video tasks. Training a transformer from zero for these tasks usually requires a lot of data and computational resources. Video Swin Transformer (VST) is a pure-transformer model developed for video classification which achieves state-of-the-art results in accuracy and efficiency on several datasets. In this paper, we aim to understand if VST generalizes well enough to be used in an out-of-domain setting. We study the performance of VST on two large-scale datasets, namely FCVID and Something-Something using a transfer learning approach from Kinetics-400, which requires around 4x less memory than training from scratch. We then break down the results to understand where VST fails the most and in which scenarios the transfer-learning approach is viable. Our experiments show an 85\% top-1 accuracy on FCVID without retraining the whole model which is equal to the state-of-the-art for the dataset and a 21\% accuracy on Something-Something. The experiments also suggest that the performance of the VST decreases on average when the video duration increases which seems to be a consequence of a design choice of the model. From the results, we conclude that VST generalizes well enough to classify out-of-domain videos without retraining when the target classes are from the same type as the classes used to train the model. We observed this effect when we performed transfer-learning from Kinetics-400 to FCVID, where most datasets target mostly objects. On the other hand, if the classes are not from the same type, then the accuracy after the transfer-learning approach is expected to be poor. We observed this effect when we performed transfer-learning from Kinetics-400, where the classes represent mostly objects, to Something-Something, where the classes represent mostly actions.
CAREER: Transfer Learning for Economic Prediction of Labor Sequence Data
Vafa, Keyon, Palikot, Emil, Du, Tianyu, Kanodia, Ayush, Athey, Susan, Blei, David M.
Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although modern machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of millions of individuals. However, standard econometric models cannot take advantage of their scale or incorporate them into the analysis of survey data. To this end we develop CAREER, a transformer-based model that uses transfer learning to learn representations of job sequences. CAREER is first fit to large, passively-collected resume data and then fine-tuned to smaller, better-curated datasets for economic inferences. We fit CAREER to a dataset of 24 million job sequences from resumes, and fine-tune its representations on longitudinal survey datasets. We find that CAREER forms accurate predictions of job sequences on three widely-used economics datasets. We further find that CAREER can be used to form good predictions of other downstream variables; incorporating CAREER into a wage model provides better predictions than the econometric models currently in use.
Transfer Learning on Electromyography (EMG) Tasks: Approaches and Beyond
Wu, Di, Yang, Jie, Sawan, Mohamad
Machine learning on electromyography (EMG) has recently achieved remarkable success on a variety of tasks, while such success relies heavily on the assumption that the training and future data must be of the same data distribution. However, this assumption may not hold in many real-world applications. Model calibration is required via data re-collection and label annotation, which is generally very expensive and time-consuming. To address this problem, transfer learning (TL), which aims to improve target learners' performance by transferring the knowledge from related source domains, is emerging as a new paradigm to reduce the amount of calibration effort. In this survey, we assess the eligibility of more than fifty published peer-reviewed representative transfer learning approaches for EMG applications. Unlike previous surveys on purely transfer learning or EMG-based machine learning, this survey aims to provide an insight into the biological foundations of existing transfer learning methods on EMG-related analysis. In specific, we first introduce the physiological structure of the muscles and the EMG generating mechanism, and the recording of EMG to provide biological insights behind existing transfer learning approaches. Further, we categorize existing research endeavors into data based, model based, training scheme based, and adversarial based. This survey systematically summarizes and categorizes existing transfer learning approaches for EMG related machine learning applications. In addition, we discuss possible drawbacks of existing works and point out the future direction of better EMG transfer learning algorithms to enhance practicality for real-world applications.
Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
Yoon, Minji, Palowitch, John, Zelle, Dustin, Hu, Ziniu, Salakhutdinov, Ruslan, Perozzi, Bryan
Data continuously emitted from industrial ecosystems such as social or e-commerce platforms are commonly represented as heterogeneous graphs (HG) composed of multiple node/edge types. State-of-the-art graph learning methods for HGs known as heterogeneous graph neural networks (HGNNs) are applied to learn deep context-informed node representations. However, many HG datasets from industrial applications suffer from label imbalance between node types. As there is no direct way to learn using labels rooted at different node types, HGNNs have been applied to only a few node types with abundant labels. We propose a zero-shot transfer learning module for HGNNs called a Knowledge Transfer Network (KTN) that transfers knowledge from label-abundant node types to zero-labeled node types through rich relational information given in the HG. KTN is derived from the theoretical relationship, which we introduce in this work, between distinct feature extractors for each node type given in an HGNN model. KTN improves performance of 6 different types of HGNN models by up to 960% for inference on zero-labeled node types and outperforms state-of-the-art transfer learning baselines by up to 73% across 18 different transfer learning tasks on HGs.
Fault Diagnosis using eXplainable AI: a Transfer Learning-based Approach for Rotating Machinery exploiting Augmented Synthetic Data
Brito, Lucas Costa, Susto, Gian Antonio, Brito, Jorge Nei, Duarte, Marcus Antonio Viana
Artificial Intelligence (AI) is one of the approaches that has been proposed to analyze the collected data (e.g., vibration signals) providing a diagnosis of the asset's operating condition. It is known that models trained with labeled data (supervised) achieve excellent results, but two main problems make their application in production processes difficult: (i) impossibility or long time to obtain a sample of all operational conditions (since faults seldom happen) and (ii) high cost of experts to label all acquired data. Another limitating factor for the applicability of AI approaches in this context is the lack of interpretability of the models (black-boxes), which reduces the confidence of the diagnosis and trust/adoption from users. To overcome these problems, a new generic and interpretable approach for classifying faults in rotating machinery based on transfer learning from augmented synthetic data to real rotating machinery is here proposed, namelly FaultD-XAI (Fault Diagnosis using eXplainable AI). To provide scalability using transfer learning, synthetic vibration signals are created mimicking the characteristic behavior of failures in operation. The application of Gradient-weighted Class Activation Mapping (Grad-CAM) with 1D Convolutional Neural Network (1D CNN) allows the interpretation of results, supporting the user in decision making and increasing diagnostic confidence. The proposed approach not only obtained promising diagnostic performance, but was also able to learn characteristics used by experts to identify conditions in a source domain and apply them in another target domain. The experimental results suggest a promising approach on exploiting transfer learning, synthetic data and explainable artificial intelligence for fault diagnosis. Lastly, to guarantee reproducibility and foster research in the field, the developed dataset is made publicly available.