Goto

Collaborating Authors

 Transfer Learning


Transfer Learning for Sequences via Learning to Collocate

arXiv.org Artificial Intelligence

Transfer learning aims to solve the data sparsity for a target domain by applying information of the source domain. Given a sequence (e.g. a natural language sentence), the transfer learning, usually enabled by recurrent neural network (RNN), represents the sequential information transfer. RNN uses a chain of repeating cells to model the sequence data. However, previous studies of neural network based transfer learning simply represents the whole sentence by a single vector, which is unfeasible for seq2seq and sequence labeling. Meanwhile, such layer-wise transfer learning mechanisms lose the fine-grained cell-level information from the source domain. In this paper, we proposed the aligned recurrent transfer, ART, to achieve cell-level information transfer. ART is under the pre-training framework. Each cell attentively accepts transferred information from a set of positions in the source domain. Therefore, ART learns the cross-domain word collocations in a more flexible way. We conducted extensive experiments on both sequence labeling tasks (POS tagging, NER) and sentence classification (sentiment analysis). ART outperforms the state-of-the-arts over all experiments.


Transfer Learning for Non-Intrusive Load Monitoring

arXiv.org Machine Learning

Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a `complex' appliance, e.g., washing machine, can be transferred to a `simple' appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning.


Transfusion: Understanding Transfer Learning with Applications to Medical Imaging

arXiv.org Machine Learning

With the increasingly varied applications of deep learning, transfer learning has emerged as a critically important technique. However, the central question of how much feature reuse in transfer is the source of benefit remains unanswered. In this paper, we present an in-depth analysis of the effects of transfer, focusing on medical imaging, which is a particularly intriguing setting. Here, transfer learning is extremely popular, but data differences between pretraining and finetuing are considerable, reiterating the question of what is transferred. With experiments on two large scale medical imaging datasets, and CIFAR-10, we find transfer has almost negligible effects on performance, but significantly helps convergence speed. However, in all of these settings, convergence without transfer can be sped up dramatically by using only mean and variance statistics of the pretrained weights. Visualizing the lower layer filters shows that models trained from random initialization do not learn Gabor filters on medical images. We use CCA (canonical correlation analysis) to study the learned representations of the different models, finding that pretrained models are surprisingly similar to random initialization at higher layers. This similarity is evidenced both through model learning dynamics and a transfusion experiment, which explores the convergence speed using a subset of pretrained weights.


Size Independent Neural Transfer for RDDL Planning

arXiv.org Machine Learning

Neural planners for RDDL MDPs produce deep reactive policies in an offline fashion. These scale well with large domains, but are sample inefficient and time-consuming to train from scratch for each new problem. To mitigate this, recent work has studied neural transfer learning, so that a generic planner trained on other problems of the same domain can rapidly transfer to a new problem. However, this approach only transfers across problems of the same size. We present the first method for neural transfer of RDDL MDPs that can transfer across problems of different sizes. Our architecture has two key innovations to achieve size independence: (1) a state encoder, which outputs a fixed length state embedding by max pooling over varying number of object embeddings, (2) a single parameter-tied action decoder that projects object embeddings into action probabilities for the final policy. On the two challenging RDDL domains of SysAdmin and Game Of Life, our approach powerfully transfers across problem sizes and has superior learning curves over training from scratch.


Smart City Development With Urban Transfer Learning

IEEE Computer

The governments of many cities just starting smart city development will face a critical cold-start problem: how to develop a new smart city service with limited data. We investigate the common process of urban transfer learning, i.e., leveraging transfer learning to accelerate smart city development, and also provide city planners and relevant practitioners with guidelines for applying this novel learning paradigm.


Embodied Multimodal Multitask Learning

arXiv.org Machine Learning

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors.


Parameter-Efficient Transfer Learning for NLP

arXiv.org Machine Learning

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task.


ML for Flood Forecasting at Scale

arXiv.org Machine Learning

Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction. Floods are the most common and deadly natural disaster in the world. Every year, floods cause from thousands to tens of thousands of fatalities [1, 22, 2, 21, 14], affect hundreds of millions of people [14, 21, 2], and cause tens of billions of dollars worth of damages [1, 2]. These numbers have only been increasing in recent decades [23]. Indeed, the UN charter notes floods to be one of the key motivators for formulating the sustainable development goals (SDGs), and directly challenges us: "They knew that earthquakes and floods were inevitable, but that the high death tolls were not."


Evaluation of Transfer Learning for Classification of: (1) Diabetic Retinopathy by Digital Fundus Photography and (2) Diabetic Macular Edema, Choroidal Neovascularization and Drusen by Optical Coherence Tomography

arXiv.org Machine Learning

Deep learning has been successfully applied to a variety of image classification tasks. There has been keen interest to apply deep learning in the medical domain, particularly specialties that heavily utilize imaging, such as ophthalmology. One issue that may hinder application of deep learning to the medical domain is the vast amount of data necessary to train deep neural networks (DNNs). Because of regulatory and privacy issues associated with medicine, and the generally proprietary nature of data in medical domains, obtaining large datasets to train DNNs is a challenge, particularly in the ophthalmology domain. Transfer learning is a technique developed to address the issue of applying DNNs for domains with limited data. Prior reports on transfer learning have examined custom networks to fully train or used a particular DNN for transfer learning. However, to the best of my knowledge, no work has systematically examined a suite of DNNs for transfer learning for classification of diabetic retinopathy, diabetic macular edema, and two key features of age-related macular degeneration. This work attempts to investigate transfer learning for classification of these ophthalmic conditions. Part I gives a condensed overview of neural networks and the DNNs under evaluation. Part II gives the reader the necessary background concerning diabetic retinopathy and prior work on classification using retinal fundus photographs. The methodology and results of transfer learning for diabetic retinopathy classification are presented, showing that transfer learning towards this domain is feasible, with promising accuracy. Part III gives an overview of diabetic macular edema, choroidal neovascularization and drusen (features associated with age-related macular degeneration), and presents results for transfer learning evaluation using optical coherence tomography to classify these entities.


Human-centric Transfer Learning Explanation via Knowledge Graph [Extended Abstract]

arXiv.org Machine Learning

Transfer learning which aims at utilizing knowledge learned from one problem (source domain) to solve another different but related problem (target domain) has attracted wide research attentions. However, the current transfer learning methods are mostly uninterpretable, especially to people without ML expertise. In this extended abstract, we brief introduce two knowledge graph (KG) based frameworks towards human understandable transfer learning explanation. The first one explains the transferability of features learned by Convolutional Neural Network (CNN) from one domain to another through pre-training and fine-tuning, while the second justifies the model of a target domain predicted by models from multiple source domains in zero-shot learning (ZSL). Both methods utilize KG and its reasoning capability to provide rich and human understandable explanations to the transfer procedure.