Transfer Learning
Predictive Minisci and P450 Late Stage Functionalization with Transfer Learning
Structural diversification of lead molecules is a key component of drug discovery to explore close-in chemical space. Late stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made significant strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches.
Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning
Lengerich, Chris, Synnaeve, Gabriel, Zhang, Amy, Leather, Hugh, Shuster, Kurt, Charton, François, Redwood, Charysse
Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.
Tackling Data Scarcity with Transfer Learning: A Case Study of Thickness Characterization from Optical Spectra of Perovskite Thin Films
Tian, Siyu Isaac Parker, Ren, Zekun, Venkataraj, Selvaraj, Cheng, Yuanhang, Bash, Daniil, Oviedo, Felipe, Senthilnath, J., Chellappan, Vijila, Lim, Yee-Fun, Aberle, Armin G., MacLeod, Benjamin P, Parlane, Fraser G. L., Berlinguette, Curtis P., Li, Qianxiao, Buonassisi, Tonio, Liu, Zhe
Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propose a machine learning model called thicknessML that predicts thickness from UV-Vis spectrophotometry input and an overarching transfer learning workflow. We demonstrate the transfer learning workflow from generic source domain of generic band-gapped materials to specific target domain of perovskite materials, where the target domain data only come from limited number (18) of refractive indices from literature. The target domain can be easily extended to other material classes with a few literature data. Defining thickness prediction accuracy to be within-10% deviation, thicknessML achieves 92.2% (with a deviation of 3.6%) accuracy with transfer learning compared to 81.8% (with a deviation of 3.6%) 11.7% without (lower mean and larger standard deviation). Experimental validation on six deposited perovskite films also corroborates the efficacy of the proposed workflow by yielding a 10.5% mean absolute percentage error (MAPE).
COVID-19 Detection Based on Self-Supervised Transfer Learning Using Chest X-Ray Images
Li, Guang, Togo, Ren, Ogawa, Takahiro, Haseyama, Miki
Purpose: Considering several patients screened due to COVID-19 pandemic, computer-aided detection has strong potential in assisting clinical workflow efficiency and reducing the incidence of infections among radiologists and healthcare providers. Since many confirmed COVID-19 cases present radiological findings of pneumonia, radiologic examinations can be useful for fast detection. Therefore, chest radiography can be used to fast screen COVID-19 during the patient triage, thereby determining the priority of patient's care to help saturated medical facilities in a pandemic situation. Methods: In this paper, we propose a new learning scheme called self-supervised transfer learning for detecting COVID-19 from chest X-ray (CXR) images. We compared six self-supervised learning (SSL) methods (Cross, BYOL, SimSiam, SimCLR, PIRL-jigsaw, and PIRL-rotation) with the proposed method. Additionally, we compared six pretrained DCNNs (ResNet18, ResNet50, ResNet101, CheXNet, DenseNet201, and InceptionV3) with the proposed method. We provide quantitative evaluation on the largest open COVID-19 CXR dataset and qualitative results for visual inspection. Results: Our method achieved a harmonic mean (HM) score of 0.985, AUC of 0.999, and four-class accuracy of 0.953. We also used the visualization technique Grad-CAM++ to generate visual explanations of different classes of CXR images with the proposed method to increase the interpretability. Conclusions: Our method shows that the knowledge learned from natural images using transfer learning is beneficial for SSL of the CXR images and boosts the performance of representation learning for COVID-19 detection. Our method promises to reduce the incidence of infections among radiologists and healthcare providers.
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Eom, Youngsik, Lee, Yeonghyeon, Um, Ji Sub, Kim, Hoirin
Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck (VIB) for speech anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA) database shows that our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems. Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.
Covariance-Generalized Matching Component Analysis for Data Fusion and Transfer Learning
Lorenzo, Nick, O'Rourke, Sean, Scarnati, Theresa
The matching component analysis (MCA) transfer learning technique was originally developed as a data augmentation strategy for building large, representative machine learning training sets within a data-limited environment [1]. Specifically, MCA maps a training domain and a testing domain into a low-dimensional, common domain using only a small number of matched train-test image pairs. These maps minimize the expected distance between train-test image pairs within the common domain, subject to an identity matrix covariance constraint and an affine linear structure. The training domain's optimal affine linear transformation - encoded with information from the matched train-test image pairs - is then applied to a large number of unmatched training images, resulting in a large number of common-domain image representations to be used as training inputs. We are interested in extending the MCA application space to the fusion of data acquired from two different modalities.
Text and Image Classification for Craigslist using GloVe and MobileNet -- Transfer Learning
Craigslist, is an American Classified Advertisements website having various sections about housing, jobs, services (beauty, legal, health, etc.), and products for sale. Anyone can list a product or service on Craigslist for free and those interested can contact the poster. However, there are many listings on Craigslist that are not properly classified and are posted in incorrect sections. A particular category on the website that is of interest for us is the'Bikes' Section. Just like all other sections on the website, the'Bikes' Section of Craigslist, has many listings that do not belong there.
ProQA: Structural Prompt-based Pre-training for Unified Question Answering
Zhong, Wanjun, Gao, Yifan, Ding, Ning, Qin, Yujia, Liu, Zhiyuan, Zhou, Ming, Wang, Jiahai, Yin, Jian, Duan, Nan
Question Answering (QA) is a longstanding challenge in natural language processing. Existing QA works mostly focus on specific question types, knowledge domains, or reasoning skills. The specialty in QA research hinders systems from modeling commonalities between tasks and generalization for wider applications. To address this issue, we present ProQA, a unified QA paradigm that solves various tasks through a single model. ProQA takes a unified structural prompt as the bridge and improves the QA-centric ability by structural prompt-based pre-training. Through a structurally designed prompt-based input schema, ProQA concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. Furthermore, ProQA is pre-trained with structural prompt-formatted large-scale synthesized corpus, which empowers the model with the commonly-required QA ability. Experimental results on 11 QA benchmarks demonstrate that ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.
Transfer Learning Enhanced DeepONet for Long-Time Prediction of Evolution Equations
Xu, Wuzhe, Lu, Yulong, Wang, Li
Deep operator network (DeepONet) has demonstrated great success in various learning tasks, including learning solution operators of partial differential equations. In particular, it provides an efficient approach to predict the evolution equations in a finite time horizon. Nevertheless, the vanilla DeepONet suffers from the issue of stability degradation in the long-time prediction. This paper proposes a {\em transfer-learning} aided DeepONet to enhance the stability. Our idea is to use transfer learning to sequentially update the DeepONets as the surrogates for propagators learned in different time frames. The evolving DeepONets can better track the varying complexities of the evolution equations, while only need to be updated by efficient training of a tiny fraction of the operator networks. Through systematic experiments, we show that the proposed method not only improves the long-time accuracy of DeepONet while maintaining similar computational cost but also substantially reduces the sample size of the training set.
Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning
Gheini, Mozhdeh, Ma, Xuezhe, May, Jonathan
A recent family of techniques, dubbed lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen. While proven to be an effective method, there are no existing studies on if and how such knowledge of the downstream fine-tuning approach should affect the pretraining stage. In this work, we show that taking the ultimate choice of fine-tuning method into consideration boosts the performance of parameter-efficient fine-tuning. By relying on optimization-based meta-learning using MAML with certain modifications for our distinct purpose, we prime the pretrained Figure 1: Transfer learning for NLP pipeline; the model specifically for parameter-efficient finetuning, shaded block is our contribution. Conventional transfer resulting in gains of up to 1.7 points practice (dashed arrows) does not differentiate between on cross-lingual NER fine-tuning. Our ablation full fine-tuning and parameter-efficient fine-tuning in settings and analyses further reveal that any way. This work proposes a meta-learning solution the tweaks we introduce in MAML are crucial to further modify and prime a pretrained model parameters for the attained gains.