Transfer Learning
Feasibility and Transferability of Transfer Learning: A Mathematical Framework
Cao, Haoyang, Gu, Haotian, Guo, Xin, Rosenbaum, Mathieu
Transfer learning is an emerging and popular paradigm for utilizing existing knowledge from previous learning tasks to improve the performance of new ones. Despite its numerous empirical successes, theoretical analysis for transfer learning is limited. In this paper we build for the first time, to the best of our knowledge, a mathematical framework for the general procedure of transfer learning. Our unique reformulation of transfer learning as an optimization problem allows for the first time, analysis of its feasibility. Additionally, we propose a novel concept of transfer risk to evaluate transferability of transfer learning. Our numerical studies using the Office-31 dataset demonstrate the potential and benefits of incorporating transfer risk in the evaluation of transfer learning performance.
Learning Visual Representations for Transfer Learning by Suppressing Texture
Mishra, Shlok, Shah, Anshul, Bansal, Ankan, Anjaria, Janit, Choi, Jonghyun, Shrivastava, Abhinav, Sharma, Abhishek, Jacobs, David
Recent literature has shown that features obtained from supervised training of CNNs may over-emphasize texture rather than encoding high-level information. In self-supervised learning in particular, texture as a low-level cue may provide shortcuts that prevent the network from learning higher level representations. To address these problems we propose to use classic methods based on anisotropic diffusion to augment training using images with suppressed texture. This simple method helps retain important edge information and suppress texture at the same time. We empirically show that our method achieves state-of-the-art results on object detection and image classification with eight diverse datasets in either supervised or self-supervised learning tasks such as MoCoV2 and Jigsaw. Our method is particularly effective for transfer learning tasks and we observed improved performance on five standard transfer learning datasets. The large improvements (up to 11.49\%) on the Sketch-ImageNet dataset, DTD dataset and additional visual analyses with saliency maps suggest that our approach helps in learning better representations that better transfer.
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Liu, Ruyang, Huang, Jingjia, Li, Ge, Feng, Jiashi, Wu, Xinglong, Li, Thomas H.
However, it is hard to get a pretrained model as powerful as CLIP in the video Image-text pretrained models, e.g., CLIP, have shown domain due to the unaffordable demands on computation resources impressive general multi-modal knowledge learned from and the difficulty of collecting video-text data pairs large-scale image-text data pairs, thus attracting increasing as large and diverse as image-text data. Instead of directly attention for their potential to improve visual representation pursuing video-text pretrained models [17, 27], a potential learning in the video domain. In this paper, based alternative solution that benefits video downstream tasks is on the CLIP model, we revisit temporal modeling in the to transfer the knowledge in image-text pretrained models context of image-to-video knowledge transferring, which is to the video domain, which has attracted increasing attention the key point for extending image-text pretrained models to in recent years [12, 13, 26, 29, 30, 41]. the video domain. We find that current temporal modeling Extending pretrained 2D image models to the video domain mechanisms are tailored to either high-level semanticdominant is a widely-studied topic in deep learning [4, 7], and tasks (e.g., retrieval) or low-level visual patterndominant the key point lies in empowering 2D models with the capability tasks (e.g., recognition), and fail to work on the of modeling temporal dependency between video two cases simultaneously. The key difficulty lies in modeling frames while taking advantages of knowledge in the pretrained temporal dependency while taking advantage of both highlevel models. In this paper, based on CLIP [32], we revisit and low-level knowledge in CLIP model. To tackle temporal modeling in the context of image-to-video knowledge this problem, we present Spatial-Temporal Auxiliary Network transferring, and present Spatial-Temporal Auxiliary (STAN) - a simple and effective temporal modeling Network (STAN) - a new temporal modeling method that mechanism extending CLIP model to diverse video tasks. is easy and effective for extending image-text pretrained Specifically, to realize both low-level and high-level knowledge model to diverse downstream video tasks.
Classification of Luminal Subtypes in Full Mammogram Images Using Transfer Learning
Panambur, Adarsh Bhandary, Madhu, Prathmesh, Maier, Andreas
Automatic identification of patients with luminal and non-luminal subtypes during a routine mammography screening can support clinicians in streamlining breast cancer therapy planning. Recent machine learning techniques have shown promising results in molecular subtype classification in mammography; however, they are highly dependent on pixel-level annotations, handcrafted, and radiomic features. In this work, we provide initial insights into the luminal subtype classification in full mammogram images trained using only image-level labels. Transfer learning is applied from a breast abnormality classification task, to finetune a ResNet-18-based luminal versus non-luminal subtype classification task. We present and compare our results on the publicly available CMMD dataset and show that our approach significantly outperforms the baseline classifier by achieving a mean AUC score of 0.6688 and a mean F1 score of 0.6693 on the test dataset. The improvement over baseline is statistically significant, with a p-value of p<0.0001.
Building Machine Learning Models Like Open Source Software
Transfer learning--using a machine learning (ML) model that has been pretrained as a starting point for training on a different, but related task--has proven itself as an effective way to make models converge faster to a better solution with less-labeled data. These benefits have led pretrained models to see a staggering amount of reuse; for example, the pretrained BERT model has been downloaded tens of millions of times. Taking a step back, however, reveals a major issue with the development of pretrained models: They are never updated! Instead, after being released, they are typically used as-is until a better pretrained model comes along. There are many reasons to update a pretrained model--for example, to improve its performance, address problematic behavior and biases, or make it applicable to new problems--but there is currently no effective approach for updating models.
Towards Estimating Transferability using Hard Subsets
Menta, Tarun Ram, Jandial, Surgan, Patil, Akash, KB, Vimal, Bachu, Saketh, Krishnamurthy, Balaji, Balasubramanian, Vineeth N., Agarwal, Chirag, Sarkar, Mausoom
As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine-tuning. By leveraging the model's internal and output representations, we introduce two techniques - one class-agnostic and another class-specific - to identify harder subsets and show that H Transfer learning (Pan & Yang, 2009; Torrey & Shavlik, 2010; Weiss et al., 2016) aims to improve the performance of models on target tasks by utilizing the knowledge from source tasks. With the increasing development of large-scale pre-trained models (Devlin et al., 2019; Chen et al., 2020a;b; Radford et al., 2021b), and the availability of multiple model choices (e.g model hubs of Pytorch, Tensorflow, Hugging Face) for transfer learning, it is critical to estimate their transferability without training on the target task and determine how effectively transfer learning algorithms will transfer knowledge from the source to the target task. To this end, transferability estimation metrics (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) have been recently proposed to quantify how easy it is to use the knowledge learned from these models with minimal to no additional training using the target dataset. Given multiple pre-trained source models and target datasets, estimating transferability is essential because it is non-trivial to determine which source model transfers best to a target dataset, and that training multiple models using all source-target combinations can be computationally expensive. Recent years have seen a few different approaches (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) for estimating a given transfer learning task from a source model. However, existing such methods often require performing the transfer learning task for parameter optimization (Achille et al., 2019; Zamir et al., 2018b) or making strong assumptions on the source and target datasets (Tran et al., 2019b; Zamir et al., 2018b).
Language-Informed Transfer Learning for Embodied Household Activities
Jiang, Yuqian, Gao, Qiaozi, Thattai, Govind, Sukhatme, Gaurav
For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users. Fine-tuning neural networks on a variety of downstream tasks has been successful in many vision and language domains, but research is still limited on transfer learning between diverse long-horizon tasks. We propose that, compared to reinforcement learning for a new household activity from scratch, home robots can benefit from transferring the value and policy networks trained for similar tasks. We evaluate this idea in the BEHAVIOR simulation benchmark which includes a large number of household activities and a set of action primitives. For easy mapping between state spaces of different tasks, we provide a text-based representation and leverage language models to produce a common embedding space. The results show that the selection of similar source activities can be informed by the semantic similarity of state and goal descriptions with the target task. We further analyze the results and discuss ways to overcome the problem of catastrophic forgetting.
Learning Transfer Learning. Transfer learning is the process of…
This concept is commonly studied in the field of machine learning, where it is used to refer to the practice of storing knowledge gained from solving one problem and applying it to a different," related problem. Transfer learning is often viewed as a design methodology, as it involves applying previously learned information to new situations in order to improve the efficiency and effectiveness of the learning process. In other words, transfer learning allows individuals or machine learning algorithms to build upon their existing knowledge and skills in order to solve new problems. Transfer learning involves taking knowledge and skills acquired in one context and applying them to a different, but related situation. For example, if you have learned how to recognize cars, that knowledge could be useful in learning how to recognize trucks. Similarly, if you have learned how to ride a motorbike, that knowledge may be transferable to learning how to ride an e-scooter.
MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning
Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training monolingual TLMs in a low-resource setting: greatly reducing TLM size, and complementing the masked language modeling objective with two linguistically rich supervised tasks (part-of-speech tagging and dependency parsing). Results from 7 diverse languages indicate that our model, MicroBERT, is able to produce marked improvements in downstream task evaluations relative to a typical monolingual TLM pretraining approach. Specifically, we find that monolingual MicroBERT models achieve gains of up to 18% for parser LAS and 11% for NER F1 compared to a multilingual baseline, mBERT, while having less than 1% of its parameter count. We conclude reducing TLM parameter count and using labeled data for pretraining low-resource TLMs can yield large quality benefits and in some cases produce models that outperform multilingual approaches.
A Survey on Deep Industrial Transfer Learning in Fault Prognostics
Due to its probabilistic nature, fault prognostics is a prime example of a use case for deep learning utilizing big data. However, the low availability of such data sets combined with the high effort of fitting, parameterizing and evaluating complex learning algorithms to the heterogenous and dynamic settings typical for industrial applications oftentimes prevents the practical application of this approach. Automatic adaptation to new or dynamically changing fault prognostics scenarios can be achieved using transfer learning or continual learning methods. In this paper, a first survey of such approaches is carried out, aiming at establishing best practices for future research in this field. It is shown that the field is lacking common benchmarks to robustly compare results and facilitate scientific progress. Therefore, the data sets utilized in these publications are surveyed as well in order to identify suitable candidates for such benchmark scenarios.