Transfer Learning
An Introduction to Lifelong Supervised Learning
Sodhani, Shagun, Faramarzi, Mojtaba, Mehta, Sanket Vaibhav, Malviya, Pranshu, Abdelsalam, Mohamed, Janarthanan, Janarthanan, Chandar, Sarath
This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7.
A Cross-City Federated Transfer Learning Framework: A Case Study on Urban Region Profiling
Chen, Gaode, Su, Yijun, Zhang, Xinghua, Hu, Anmin, Chen, Guochun, Feng, Siyuan, Xiang, Ji, Zhang, Junbo, Zheng, Yu
Data insufficiency problems (i.e., data missing and label scarcity) caused by inadequate services and infrastructures or imbalanced development levels of cities have seriously affected the urban computing tasks in real scenarios. Prior transfer learning methods inspire an elegant solution to the data insufficiency, but are only concerned with one kind of insufficiency issue and fail to give consideration to both sides. In addition, most previous cross-city transfer methods overlook inter-city data privacy which is a public concern in practical applications. To address the above challenging problems, we propose a novel Cross-city Federated Transfer Learning framework (CcFTL) to cope with the data insufficiency and privacy problems. Concretely, CcFTL transfers the relational knowledge from multiple rich-data source cities to the target city. Besides, the model parameters specific to the target task are firstly trained on the source data and then fine-tuned to the target city by parameter transfer. With our adaptation of federated training and homomorphic encryption settings, CcFTL can effectively deal with the data privacy problem among cities. We take the urban region profiling as an application of smart cities and evaluate the proposed method with a real-world study. The experiments demonstrate the notable superiority of our framework over several competitive state-of-the-art methods.
Review -- What Makes Instance Discrimination Good for Transfer Learning?
The contrastive network provides more complete reconstructions spatially. The images are reconstructed at the correct scale and location. A possible explanation is that in order to make one instance unique from all other instances, the network strives to preserve as much information as possible. The supervised network loses information over large regions in the images, likely because its features are mainly attuned to the most discriminative object parts, which are central to the classification task, rather than objects and images as a whole. The resulting loss of information may prevent the supervised network from detecting the full envelope of the object.
A Study on Robustness to Perturbations for Representations of Environmental Sound
Srivastava, Sangeeta, Wu, Ho-Hsiang, Rulff, Joao, Fuentes, Magdalena, Cartwright, Mark, Silva, Claudio, Arora, Anish, Bello, Juan Pablo
Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent. Combined with the downstream performance, it helps us make a more informed prediction of how robust the embeddings are to the channel effects. We evaluate two embeddings -- YAMNet, and OpenL3 on monophonic (UrbanSound8K) and polyphonic (SONYC-UST) urban datasets. We show that one distance measure does not suffice in such task-independent evaluation. Although Fr\'echet Audio Distance (FAD) correlates with the trend of the performance drop in the downstream task most accurately, we show that we need to study FAD in conjunction with the other distances to get a clear understanding of the overall effect of the perturbation. In terms of the embedding performance, we find OpenL3 to be more robust than YAMNet, which aligns with the HEAR evaluation.
Federated and Transfer Learning: A Survey on Adversaries and Defense Mechanisms
Hallaji, Ehsan, Razavi-Far, Roozbeh, Saif, Mehrdad
The advent of federated learning has facilitated large-scale data exchange amongst machine learning models while maintaining privacy. Despite its brief history, federated learning is rapidly evolving to make wider use more practical. One of the most significant advancements in this domain is the incorporation of transfer learning into federated learning, which overcomes fundamental constraints of primary federated learning, particularly in terms of security. This chapter performs a comprehensive survey on the intersection of federated and transfer learning from a security point of view. The main goal of this study is to uncover potential vulnerabilities and defense mechanisms that might compromise the privacy and performance of systems that use federated and transfer learning.
Transfer Learning with Fine-Tuning on MobileNet and GRAD-CAM for Bones Abnormalities Diagnosis
Osteoarthritis is a common medical condition. Unfortunately, despite the support of X-ray imaging technology in diagnosis, the accuracy of diagnostic results still depends on human factors. Furthermore, when errors do occur, they are often detected late, leading to a waste of time, money, and even disability for the patient. This study has deployed and evaluated transfer learning techniques in abnormal and normal bone images classification on X-ray images collected from the dataset of MUsculoskeletal RAdiographs (MURA) with 17,367 images and then leveraged techniques for results explanations of learning algorithms such as Gradient-weighted Class Activation Mapping (GRAD-CAM) to provide visual highlighted interesting areas in the images which can be signals for anomalies in bones. The classification performance using MobileNet with techniques of hyper-parameters fine-tuning can reach an accuracy of 0.84 in abnormal and normal bone classification tasks on the wrist, humerus, and elbow.
Why academic research in AI is a total waste of time
Jeremy Howard, a creator of fast.ai and an ex-President of Kaggle says that most of the research in the deep learning world is a total waste of time. He explains why it is so and what is currently being under studied i.e. active learning and transfer learning. Active learning and transfer learning are further elaborated in this blog post. When asked a question "what's wrong with Artificial Intelligence?", However, when you literally dig into the question, the industry of AI is fighting its own demons.
Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains
Tanaka, Yusuke, Tanaka, Toshiyuki, Iwata, Tomoharu, Kurashima, Takeshi, Okawa, Maya, Akagi, Yasunori, Toda, Hiroyuki
Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Beijing and various kinds of spatial datasets from New York City and Chicago.
PAC-Net: A Model Pruning Approach to Inductive Transfer Learning
Myung, Sanghoon, Huh, In, Jang, Wonik, Choe, Jae Myung, Ryu, Jisu, Kim, Dae Sin, Kim, Kee-Eung, Jeong, Changwook
Inductive transfer learning aims to learn from a small amount of training data for the target task by utilizing a pre-trained model from the source task. Most strategies that involve large-scale deep learning models adopt initialization with the pre-trained model and fine-tuning for the target task. However, when using over-parameterized models, we can often prune the model without sacrificing the accuracy of the source task. This motivates us to adopt model pruning for transfer learning with deep learning models. In this paper, we propose PAC-Net, a simple yet effective approach for transfer learning based on pruning. PAC-Net consists of three steps: Prune, Allocate, and Calibrate (PAC). The main idea behind these steps is to identify essential weights for the source task, fine-tune on the source task by updating the essential weights, and then calibrate on the target task by updating the remaining redundant weights. Under the various and extensive set of inductive transfer learning experiments, we show that our method achieves state-of-the-art performance by a large margin.
Evaluating histopathology transfer learning with ChampKit
Kaczmarzyk, Jakub R., Kurc, Tahsin M., Abousamra, Shahira, Gupta, Rajarsi, Saltz, Joel H., Koo, Peter K.
Histopathology remains the gold standard for diagnosis of various cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for various tasks, including immune cell detection and microsatellite instability classification. The state-of-the-art for each task often employs base architectures that have been pretrained for image classification on ImageNet. The standard approach to develop classifiers in histopathology tends to focus narrowly on optimizing models for a single task, not considering the aspects of modeling innovations that improve generalization across tasks. Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible benchmarking toolkit that consists of a broad collection of patch-level image classification tasks across different cancers. ChampKit enables a way to systematically document the performance impact of proposed improvements in models and methodology. ChampKit source code and data are freely accessible at https://github.com/kaczmarj/champkit .