Transfer Learning
Understanding Transfer Learning for Medical Imaging
ImageNet pre-training) is a common practice in deep learning where a pre-trained network is fine-tuned on a new dataset/task. This practice is implicitly justified by feature-reuse where features learned from ImageNet are beneficial to other datasets/tasks. This paper [1] evaluates this justification on medical images datasets. The paper concludes that (i) transfer learning does not significantly help performance, (ii) smaller, simpler convolutional architectures perform comparably to standard ImageNet models, and (iii) there are feature-independent, and not feature-reuse, benefits to pre-training, i.e., speed convergence. These three differences question the idea of feature-reuse.
Transfer learning vs federated learning: A comparative analysis
Think of the things artificial intelligence and machine learning have accomplished in the last few years– real-time translations, outperforming humans at board games, drug discovery etc. Transfer learning, federated learning, reinforcement learning, self-supervised learning etc., are the cutting-edge techniques that made these milestones possible. While transfer learning is an old machine learning technique, federated learning was introduced in 2017 by Google. Deep learning models need huge swathes of labelled data to be trained on to learn and work effectively. The process is also time-consuming. Transfer learning can help tackle these challenges.
What Is Transfer Learning?
Editor's note: The name of the NVIDIA Transfer Learning Toolkit was changed to NVIDIA TAO Toolkit in August 2021. All references to the name have been updated in this blog. You probably have a career. But hit the books for a graduate degree or take online certificate courses by night, and you could start a new career building on your past experience. Transfer learning is the same idea.
The Idea Behind Transfer Learning: Stand on the Shoulders of Giants
Training big networks on large datasets is expensive considering computational equipment, engineers working on the problem in terms of human resources is also very demanding; trials and errors in training models from the scratch can be time consuming, inefficient and unproductive. Imagine the simple problem of classification on unstructured data in medical domain like sorting the X-rays and training the network to identify if there's broken bone or not. To reach any decent accuracy model has to learn what a broken bone looks like based on images in dataset, it has to make sense of pixels, edges and shapes. This is where the idea of Transfer Learning kicks in: model that is trained on similar data is now taken for the new purpose, weights are frozen and non-trainable layers will be incorporated into a new model that is capable of solving similar problem on smaller dataset. Similarly to Computer Vision type of problem, NLP tasks can also be managed with Transfer Learning methods: for example if we are building a model that takes descriptions of patient symptoms where aim is to predict the possible conditions associated with symptoms; in such case model is required to learn language semantics and how the sequence of words creates the meaning.
Transfer learning for cross-modal demand prediction of bike-share and public transit
Hua, Mingzhuang, Pereira, Francisco Camara, Jiang, Yu, Chen, Xuewu
The urban transportation system is a combination of multiple transport modes, and the interdependencies across those modes exist. This means that the travel demand across different travel modes could be correlated as one mode may receive demand from or create demand for another mode, not to mention natural correlations between different demand time series due to general demand flow patterns across the network. It is expectable that cross-modal ripple effects become more prevalent, with Mobility as a Service. Therefore, by propagating demand data across modes, a better demand prediction could be obtained. To this end, this study explores various machine learning models and transfer learning strategies for cross-modal demand prediction. The trip data of bike-share, metro, and taxi are processed as the station-level passenger flows, and then the proposed prediction method is tested in the large-scale case studies of Nanjing and Chicago. The results suggest that prediction models with transfer learning perform better than unimodal prediction models. Furthermore, stacked Long Short-Term Memory model performs particularly well in cross-modal demand prediction. These results verify our combined method's forecasting improvement over existing benchmarks and demonstrate the good transferability for cross-modal demand prediction in multiple cities.
Representation Learning via Consistent Assignment of Views to Clusters
Silva, Thalles, Rivera, Adín Ramírez
We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce different views of a given image to be assigned to the same prototype. Unlike contemporary work on contrastive learning with deep clustering, CARL proposes to learn the set of general prototypes in an online fashion, using gradient descent without the necessity of using non-differentiable algorithms or K-Means to solve the cluster assignment problem. CARL surpasses its competitors in many representations learning benchmarks, including linear evaluation, semi-supervised learning, and transfer learning.
Transfer Learning Approaches for Neuroimaging Analysis: A Scoping Review
Deep learning algorithms have been moderately successful in diagnoses of diseases by analyzing medical images especially through neuroimaging that is rich in annotated data. Transfer learning methods have demonstrated strong performance in tackling annotated data. It utilizes and transfers knowledge learned from a source domain to target domain even when the dataset is small. There are multiple approaches to transfer learning that result in a range of performance estimates in diagnosis, detection, and classification of clinical problems. Therefore, in this paper, we reviewed transfer learning approaches, their design attributes, and their applications to neuroimaging problems. We reviewed two main literature databases and included the most relevant studies using predefined inclusion criteria. Among 50 reviewed studies, more than half of them are on transfer learning for Alzheimer's disease. Brain mapping and brain tumor detection were second and third most discussed research problems, respectively. The most common source dataset for transfer learning was ImageNet, which is not a neuroimaging dataset. This suggests that the majority of studies preferred pre-trained models instead of training their own model on a neuroimaging dataset. Although, about one third of studies designed their own architecture, most studies used existing Convolutional Neural Network architectures. Magnetic Resonance Imaging was the most common imaging modality. In almost all studies, transfer learni...
Transfer-Learning Across Datasets with Different Input Dimensions: An Algorithm and Analysis for the Linear Regression Case
Silvestrin, Luis Pedro, van Zanten, Harry, Hoogendoorn, Mark, Koole, Ger
With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand however, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer-learning algorithm that combines the new and the historical data, that is especially beneficial when the new data is scarce. We focus the approach on the linear regression case, which allows us to conduct a rigorous theoretical study on the benefits of the approach. We show that our approach is robust against negative transfer-learning, and we confirm this result empirically with real and simulated data.
Generative multitask learning mitigates target-causing confounding
Makino, Taro, Geras, Krzysztof, Cho, Kyunghyun
We propose a simple and scalable approach to causal representation learning for multitask learning. Our approach requires minimal modification to existing ML systems, and improves robustness to prior probability shift. The improvement comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as target-causing confounders. These confounders induce spurious dependencies between the input and targets. This poses a problem for the conventional approach to multitask learning, due to its assumption that the targets are conditionally independent given the input. Our proposed approach takes into account the dependency between the targets in order to alleviate target-causing confounding. All that is required in addition to usual practice is to estimate the joint distribution of the targets to switch from discriminative to generative classification, and to predict all targets jointly. Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
Al-Halah, Ziad, Ramakrishnan, Santhosh K., Grauman, Kristen
In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, RoomNav, ViewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.