Transfer Learning
Transfer Learning - Machine Learning's Next Frontier
In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data. What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training. Every time you apply your model not to a carefully constructed dataset but to the real world. The real world is messy and contains an infinite number of novel scenarios, many of which your model has not encountered during training and for which it is in turn ill-prepared to make predictions. The ability to transfer knowledge to new conditions is generally known as transfer learning and is what we will discuss in the rest of this post. Over the course of this blog post, I will first contrast transfer learning with machine learning's most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios. I will then provide examples of applications of transfer learning before delving into practical methods that can be used to transfer knowledge.
Ridesourcing Car Detection by Transfer Learning
Wang, Leye, Geng, Xu, Ke, Jintao, Peng, Chen, Ma, Xiaojuan, Zhang, Daqing, Yang, Qiang
Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods.
Transfer Learning in Intelligent Tutoring Systems — Results, Challenges and New Directions
Gress, Aubrey (University of California, Davis) | Folsom-Kovarik, J. T. (Soar Technology, Inc) | Davidson, Ian (University of California, Davis)
At the core of an intelligent tutoring system is the ability to estimate a student’s level of skill proficiency. However, making accurate skill estimates can require asking the student relatively many questions. We address this challenge by using “transfer learning,” a field of machine learning which uses data from related, but different, “source” domains to aid in learning in a poorly labeled “target” domain. Thus, to predict the skill of a student who hasn't answered many “target” skill questions, we use estimates of well tested “source” skills. We explore settings where the student has answered no questions related to the target skill (the cold start setting) and those where she has answered a few (the warm start setting). We focus on the challenging situation where the domain expert has not identified the relationship between the skills. We find that the Ridge estimator is useful for transferring knowledge from source to target skills, outperforming nonparametric regression methods and a baseline which only uses student performance on target skill questions.
How to Invent the Cognitive Stack of Intelligence – Intuition Machine – Medium
Kevin Kelly (founding editor of Wired magazine) just wrote a near disaster of an article "The AI Cargo Cult: The Myth of Superhuman AI." Kelly begins by attempting to tear down the assumptions of the superhuman AI hypothesis: You can read the article in more detail, but also make sure you read the comments. As I began to write this, I was going to refute each argument in detail. However, after a bit of thought, Kelly's arguments are without any merit that its not worth the effort to refute. So I will just point you to the comments in his article, that should be sufficient to explain Kelly's mistakes. The comments are a treasure trove of ideas on what's actually more important.
Learning to Learn from Artificial Intelligence
I bet this sounds strange. Why should we learn from artificial intelligence when artificial intelligence is in fact trying hard to become like us? Isn't our ability to think broadly and abstractly one of the things that in fact gives us an edge over machines? After all, aren't machines simply programmed to do one (or a few) things repeatedly. Sure, they can crunch ridiculous amounts of data, but they do so by following instructions given by us. Where is the learning in all of this?
Transfer Learning - Machine Learning's Next Frontier
In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data. What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training. Every time you apply your model not to a carefully constructed dataset but to the real world. The real world is messy and contains an infinite number of novel scenarios, many of which your model has not encountered during training and for which it is in turn ill-prepared to make predictions. The ability to transfer knowledge to new conditions is generally known as transfer learning and is what we will discuss in the rest of this post. Over the course of this blog post, I will first contrast transfer learning with machine learning's most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios.
Transfer Learning - Machine Learning's Next Frontier
In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data. What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training. Every time you apply your model not to a carefully constructed dataset but to the real world. The real world is messy and contains an infinite number of novel scenarios, many of which your model has not encountered during training and for which it is in turn ill-prepared to make predictions. The ability to transfer knowledge to new conditions is generally known as transfer learning and is what we will discuss in the rest of this post. Over the course of this blog post, I will first contrast transfer learning with machine learning's most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios.
Transfer Learning by Asymmetric Image Weighting for Segmentation across Scanners
Cheplygina, Veronika, van Opbroek, Annegreet, Ikram, M. Arfan, Vernooij, Meike W., de Bruijne, Marleen
Supervised learning has been very successful for automatic segmentation of images from a single scanner. However, several papers report deteriorated performances when using classifiers trained on images from one scanner to segment images from other scanners. We propose a transfer learning classifier that adapts to differences between training and test images. This method uses a weighted ensemble of classifiers trained on individual images. The weight of each classifier is determined by the similarity between its training image and the test image. We examine three unsupervised similarity measures, which can be used in scenarios where no labeled data from a newly introduced scanner or scanning protocol is available. The measures are based on a divergence, a bag distance, and on estimating the labels with a clustering procedure. We study whether the asymmetry can improve classification. Out of the three similarity measures, the bag similarity measure is the most robust across different studies and achieves excellent results on four brain tissue segmentation datasets and three white matter lesion segmentation datasets, acquired at different centers and with different scanners and scanning protocols. We show that the asymmetry can indeed be informative, and that computing the similarity from the test image to the training images is more appropriate than the opposite direction. Keywords: Machine learning, transfer learning, domain adaptation, random forests, brain tissue segmentation, white matter lesions, MRI 1. Introduction Manual biomedical image segmentation is timeconsuming and subject to intra-and interexpert variability, and thus in recent years a lot of advances have been made to automate this process. This research was performed while Veronika Cheplygina was with the Biomedical Imaging Group Rotterdam, Erasmus Medical Center, The Netherlands. She is now with the Medical Image Analysis group, Eindhoven University of Technology, The Netherlands. These include brain tissue (BT) segmentation and white matter lesion (WML) segmentation [2, 5, 6, 7, 8, 9].
Co-Clustering for Multitask Learning
Murugesan, Keerthiram, Carbonell, Jaime, Yang, Yiming
This paper presents a new multitask learning framework that learns a shared representation among the tasks, incorporating both task and feature clusters. The jointly-induced clusters yield a shared latent subspace where task relationships are learned more effectively and more generally than in state-of-the-art multitask learning methods. The proposed general framework enables the derivation of more specific or restricted state-of-the-art multitask methods. The paper also proposes a highly-scalable multitask learning algorithm, based on the new framework, using conjugate gradient descent and generalized \textit{Sylvester equations}. Experimental results on synthetic and benchmark datasets show that the proposed method systematically outperforms several state-of-the-art multitask learning methods.
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Taylor W. (Harvard University) | Konidaris, George (Brown University) | Doshi-Velez, Finale (Harvard University)
An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Gaussian Process latent variable model to jointly model the dynamics and the embedding, leading to a more elegant formulation, one that allows for better uncertainty quantification and thus more robust transfer.