Transfer Learning
A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck
Zhou, Jie, Zhang, Qi, Chen, Qin, He, Liang, Huang, Xuanjing
Event argument extraction (EAE) aims to extract arguments with given roles from texts, which have been widely studied in natural language processing. Most previous works have achieved good performance in specific EAE datasets with dedicated neural architectures. Whereas, these architectures are usually difficult to adapt to new datasets/scenarios with various annotation schemas or formats. Furthermore, they rely on large-scale labeled data for training, which is unavailable due to the high labelling cost in most cases. In this paper, we propose a multi-format transfer learning model with variational information bottleneck, which makes use of the information especially the common knowledge in existing datasets for EAE in new datasets. Specifically, we introduce a shared-specific prompt framework to learn both format-shared and format-specific knowledge from datasets with different formats. In order to further absorb the common knowledge for EAE and eliminate the irrelevant noise, we integrate variational information bottleneck into our architecture to refine the shared representation. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.
Introduction to Neural Transfer Learning with Transformers for Social Science Text Analysis
Transformer-based models for transfer learning have the potential to achieve high prediction accuracies on text-based supervised learning tasks with relatively few training data instances. These models are thus likely to benefit social scientists that seek to have as accurate as possible text-based measures but only have limited resources for annotating training data. To enable social scientists to leverage these potential benefits for their research, this paper explains how these methods work, why they might be advantageous, and what their limitations are. Additionally, three Transformer-based models for transfer learning, BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), and the Longformer (Beltagy et al. 2020), are compared to conventional machine learning algorithms on three applications. Across all evaluated tasks, textual styles, and training data set sizes, the conventional models are consistently outperformed by transfer learning with Transformers, thereby demonstrating the benefits these models can bring to text-based social science research.
Transfer learning to decode brain states reflecting the relationship between cognitive tasks
Qu, Youzhi, Jian, Xinyao, Che, Wenxin, Du, Penghui, Fu, Kai, Liu, Quanying
Transfer learning improves the performance of the target task by leveraging the data of a specific source task: the closer the relationship between the source and the target tasks, the greater the performance improvement by transfer learning. In neuroscience, the relationship between cognitive tasks is usually represented by similarity of activated brain regions or neural representation. However, no study has linked transfer learning and neuroscience to reveal the relationship between cognitive tasks. In this study, we propose a transfer learning framework to reflect the relationship between cognitive tasks, and compare the task relations reflected by transfer learning and by the overlaps of brain regions (e.g., neurosynth). Our results of transfer learning create cognitive taskonomy to reflect the relationship between cognitive tasks which is well in line with the task relations derived from neurosynth. Transfer learning performs better in task decoding with fMRI data if the source and target cognitive tasks activate similar brain regions. Our study uncovers the relationship of multiple cognitive tasks and provides guidance for source task selection in transfer learning for neural decoding based on small-sample data.
SPATL: Salient Parameter Aggregation and Transfer Learning for Heterogeneous Clients in Federated Learning
Yu, Sixing, Nguyen, Phuong, Abebe, Waqwoya, Qian, Wei, Anwar, Ali, Jannesari, Ali
Federated learning~(FL) facilitates the training and deploying AI models on edge devices. Preserving user data privacy in FL introduces several challenges, including expensive communication costs, limited resources, and data heterogeneity. In this paper, we propose SPATL, an FL method that addresses these issues by: (a) introducing a salient parameter selection agent and communicating selected parameters only; (b) splitting a model into a shared encoder and a local predictor, and transferring its knowledge to heterogeneous clients via the locally customized predictor. Additionally, we leverage a gradient control mechanism to further speed up model convergence and increase robustness of training processes. Experiments demonstrate that SPATL reduces communication overhead, accelerates model inference, and enables stable training processes with better results compared to state-of-the-art methods. Our approach reduces communication cost by up to $86.45\%$, accelerates local inference by reducing up to $39.7\%$ FLOPs on VGG-11, and requires $7.4 \times$ less communication overhead when training ResNet-20.
Transfer Learning-based State of Health Estimation for Lithium-ion Battery with Cycle Synchronization
Zhou, Kate Qi, Qin, Yan, Yuen, Chau
Accurately estimating a battery's state of health (SOH) helps prevent battery-powered applications from failing unexpectedly. With the superiority of reducing the data requirement of model training for new batteries, transfer learning (TL) emerges as a promising machine learning approach that applies knowledge learned from a source battery, which has a large amount of data. However, the determination of whether the source battery model is reasonable and which part of information can be transferred for SOH estimation are rarely discussed, despite these being critical components of a successful TL. To address these challenges, this paper proposes an interpretable TL-based SOH estimation method by exploiting the temporal dynamic to assist transfer learning, which consists of three parts. First, with the help of dynamic time warping, the temporal data from the discharge time series are synchronized, yielding the warping path of the cycle-synchronized time series responsible for capacity degradation over cycles. Second, the canonical variates retrieved from the spatial path of the cycle-synchronized time series are used for distribution similarity analysis between the source and target batteries. Third, when the distribution similarity is within the predefined threshold, a comprehensive target SOH estimation model is constructed by transferring the common temporal dynamics from the source SOH estimation model and compensating the errors with a residual model from the target battery. Through a widely-used open-source benchmark dataset, the estimation error of the proposed method evaluated by the root mean squared error is as low as 0.0034 resulting in a 77% accuracy improvement compared with existing methods.
How to Fine-tune Models with Few Samples: Update, Data Augmentation, and Test-time Augmentation
Kim, Yujin, Oh, Jaehoon, Kim, Sungnyun, Yun, Se-Young
Most of the recent few-shot learning (FSL) algorithms are based on transfer learning, where a model is pre-trained using a large amount of source data, and the pre-trained model is fine-tuned using a small amount of target data. In transfer learning-based FSL, sophisticated pre-training methods have been widely studied for universal representation. Therefore, it has become more important to utilize the universal representation for downstream tasks, but there are few studies on fine-tuning in FSL. In this paper, we focus on how to transfer pre-trained models to few-shot downstream tasks from the three perspectives: update, data augmentation, and test-time augmentation. First, we compare the two popular update methods, full fine-tuning (i.e., updating the entire network, FT) and linear probing (i.e., updating only a linear classifier, LP). We find that LP is better than FT with extremely few samples, whereas FT outperforms LP as training samples increase. Next, we show that data augmentation cannot guarantee few-shot performance improvement and investigate the effectiveness of data augmentation based on the intensity of augmentation. Finally, we adopt augmentation to both a support set for update (i.e., data augmentation) as well as a query set for prediction (i.e., test-time augmentation), considering support-query distribution shifts, and improve few-shot performance. The code is available at https://github.com/kimyuji/updating_FSL.
Effective Transfer Learning for Low-Resource Natural Language Understanding
Natural language understanding (NLU) is the task of semantic decoding of human languages by machines. NLU models rely heavily on large training data to ensure good performance. However, substantial languages and domains have very few data resources and domain experts. It is necessary to overcome the data scarcity challenge, when very few or even zero training samples are available. In this thesis, we focus on developing cross-lingual and cross-domain methods to tackle the low-resource issues. First, we propose to improve the model's cross-lingual ability by focusing on the task-related keywords, enhancing the model's robustness and regularizing the representations. We find that the representations for low-resource languages can be easily and greatly improved by focusing on just the keywords. Second, we present Order-Reduced Modeling methods for the cross-lingual adaptation, and find that modeling partial word orders instead of the whole sequence can improve the robustness of the model against word order differences between languages and task knowledge transfer to low-resource languages. Third, we propose to leverage different levels of domain-related corpora and additional masking of data in the pre-training for the cross-domain adaptation, and discover that more challenging pre-training can better address the domain discrepancy issue in the task knowledge transfer. Finally, we introduce a coarse-to-fine framework, Coach, and a cross-lingual and cross-domain parsing framework, X2Parser. Coach decomposes the representation learning process into a coarse-grained and a fine-grained feature learning, and X2Parser simplifies the hierarchical task structures into flattened ones. We observe that simplifying task structures makes the representation learning more effective for low-resource languages and domains.
A survey of transfer learning for machinery diagnostics and prognostics - Artificial Intelligence Review
In industrial manufacturing systems, failures of machines caused by faults in their key components greatly influence operational safety and system reliability. Many data-driven methods have been developed for machinery diagnostics and prognostics. However, there lacks sufficient labeled data to train a high-performance data-driven model. Moreover, machinery datasets are usually collected from different operation conditions and mechanical components, leading to poor model generalization. To address these concerns, cross-domain transfer learning methods are applied to enhance the feasibility and accuracy of data-driven methods for machinery diagnostics and prognostics.
Generative Transfer Learning: Covid-19 Classification with a few Chest X-ray Images
Kadam, Suvarna, Vaidya, Vinay G.
Detection of diseases through medical imaging is preferred due to its non-invasive nature. Medical imaging supports multiple modalities of data that enable a thorough and quick look inside a human body. However, interpreting imaging data is often time-consuming and requires a great deal of human expertise. Deep learning models can expedite interpretation and alleviate the work of human experts. However, these models are data-intensive and require significant labeled images for training. During novel disease outbreaks such as Covid-19, we often do not have the required labeled imaging data, especially at the start of the epidemic. Deep Transfer Learning addresses this problem by using a pretrained model in the public domain, e.g. any variant of either VGGNet, ResNet, Inception, DenseNet, etc., as a feature learner to quickly adapt the target task from fewer samples. Most pretrained models are deep with complex architectures. They are trained with large multi-class datasets such as ImageNet, with significant human efforts in architecture design and hyper parameters tuning. We presented 1 a simpler generative source model, pretrained on a single but related concept, can perform as effectively as existing larger pretrained models. We demonstrate the usefulness of generative transfer learning that requires less compute and training data, for Few Shot Learning (FSL) with a Covid-19 binary classification use case. We compare classic deep transfer learning with our approach and also report FSL results with three settings of 84, 20, and 10 training samples. The model implementation of generative FSL for Covid-19 classification is available publicly at https://github.com/suvarnak/GenerativeFSLCovid.git.
How to transfer the knowledge from GPT3 to small private models
A bigger teacher model gives information to a smaller student model. Extracted Understanding can be better, sometimes even more excellent than the Details chosen by humans. What is the distillation of knowledge? Knowledge distillation is the process of moving information from a big model to a single, more manageable model that may be used in real-world applications. In essence, it is a kind of model compression.