Goto

Collaborating Authors

 Transfer Learning


Day 18 of #DataScience28: Transfer Learning

#artificialintelligence

Transfer learning is a machine learning technique that allows models to use knowledge gained from previous tasks to improve performance on new, similar tasks. Transfer learning has become an important tool in the field of machine learning because it can dramatically reduce the amount of time and data needed to train models, and it can lead to better performance on a wide range of tasks. The basic idea behind transfer learning is that a model can learn features that are useful for one task and then reuse those features for another task. For example, a model trained to recognize images of cars might learn to recognize the wheels, headlights, and grille of a car. Those features could then be reused in a model trained to recognize images of trucks, even though the truck images were not part of the original training data.


Few-shot Multimodal Multitask Multilingual Learning

arXiv.org Artificial Intelligence

While few-shot learning as a transfer learning paradigm has gained significant traction for scenarios with limited data, it has primarily been explored in the context of building unimodal and unilingual models. Furthermore, a significant part of the existing literature in the domain of few-shot multitask learning perform in-context learning which requires manually generated prompts as the input, yielding varying outcomes depending on the level of manual prompt-engineering. In addition, in-context learning suffers from substantial computational, memory, and storage costs which eventually leads to high inference latency because it involves running all of the prompt's examples through the model every time a prediction is made. In contrast, methods based on the transfer learning via the fine-tuning paradigm avoid the aforementioned issues at a one-time cost of fine-tuning weights on a per-task basis. However, such methods lack exposure to few-shot multimodal multitask learning. In this paper, we propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models using task-specific hypernetworks and contrastively fine-tuning them to enable few-shot learning. FM3's architecture combines the best of both worlds of in-context and fine-tuning based learning and consists of three major components: (i) multimodal contrastive fine-tuning to enable few-shot learning, (ii) hypernetwork task adaptation to perform multitask learning, and (iii) task-specific output heads to cater to a plethora of diverse tasks. FM3 learns the most prominent tasks in the vision and language domains along with their intersections, namely visual entailment (VE), visual question answering (VQA), and natural language understanding (NLU) tasks such as neural entity recognition (NER) and the GLUE benchmark including QNLI, MNLI, QQP, and SST-2.


An Introduction to Transfer Learning in Deep Learning: A Very Short Introduction for Beginners

#artificialintelligence

Transfer Learning is a machine learning technique that enables the transfer of knowledge from a source task to a related target task. It is used to save time and computational resources in training deep learning models by leveraging existing models that have already been trained on large amounts of data. Deep learning models require large amounts of data and computational resources to train. Transfer learning allows us to overcome these limitations by reusing knowledge learned from existing models and fine-tuning them to solve a new task. The process of transfer learning in deep learning can be broken down into two main steps: reusing pre-trained models and fine-tuning pre-trained models.


Frugal day-ahead forecasting of multiple local electricity loads by aggregating adaptive models

arXiv.org Artificial Intelligence

We focus on day-ahead electricity load forecasting of substations of the distribution network in France; therefore, our problem lies between the instability of a single consumption and the stability of a countrywide total demand. Moreover, we are interested in forecasting the loads of over one thousand substations; consequently, we are in the context of forecasting multiple time series. To that end, we rely on an adaptive methodology that provided excellent results at a national scale; the idea is to combine generalized additive models with state-space representations. However, the extension of this methodology to the prediction of over a thousand time series raises a computational issue. We solve it by developing a frugal variant, reducing the number of parameters estimated; we estimate the forecasting models only for a few time series and achieve transfer learning by relying on aggregation of experts. It yields a reduction of computational needs and their associated emissions. We build several variants, corresponding to different levels of parameter transfer, and we look for the best trade-off between accuracy and frugality. The selected method achieves competitive results compared to state-of-the-art individual models. Finally, we highlight the interpretability of the models, which is important for operational applications.


[2302.04870] Offsite-Tuning: Transfer Learning without Full Model

#artificialintelligence

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning.


Offsite-Tuning: Transfer Learning without Full Model

arXiv.org Artificial Intelligence

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning.


Transfer learning for process design with reinforcement learning

arXiv.org Artificial Intelligence

Process design is a creative task that is currently performed manually by engineers. Artificial intelligence provides new potential to facilitate process design. Specifically, reinforcement learning (RL) has shown some success in automating process design by integrating data-driven models that learn to build process flowsheets with process simulation in an iterative design process. However, one major challenge in the learning process is that the RL agent demands numerous process simulations in rigorous process simulators, thereby requiring long simulation times and expensive computational power. Therefore, typically short-cut simulation methods are employed to accelerate the learning process. Short-cut methods can, however, lead to inaccurate results. We thus propose to utilize transfer learning for process design with RL in combination with rigorous simulation methods. Transfer learning is an established approach from machine learning that stores knowledge gained while solving one problem and reuses this information on a different target domain. We integrate transfer learning in our RL framework for process design and apply it to an illustrative case study comprising equilibrium reactions, azeotropic separation, and recycles, our method can design economically feasible flowsheets with stable interaction with DWSIM. Our results show that transfer learning enables RL to economically design feasible flowsheets with DWSIM, resulting in a flowsheet with an 8% higher revenue. And the learning time can be reduced by a factor of 2.


ClueGAIN: Application of Transfer Learning On Generative Adversarial Imputation Nets (GAIN)

arXiv.org Artificial Intelligence

Many studies have attempted to solve the problem of missing data using various approaches. Among them, Generative Adversarial Imputation Nets (GAIN) was first used to impute data with Generative Adversarial Nets (GAN) and good results were obtained. Subsequent studies have attempted to combine various approaches to address some of its limitations. ClueGAIN is first proposed in this study, which introduces transfer learning into GAIN to solve the problem of poor imputation performance in high missing rate data sets. ClueGAIN can also be used to measure the similarity between data sets to explore their potential connections.


Federated deep transfer learning for EEG decoding using multiple BCI tasks

arXiv.org Artificial Intelligence

Deep learning has been successful in BCI decoding. However, it is very data-hungry and requires pooling data from multiple sources. EEG data from various sources decrease the decoding performance due to negative transfer. Recently, transfer learning for EEG decoding has been suggested as a remedy and become subject to recent BCI competitions (e.g. BEETL), but there are two complications in combining data from many subjects. First, privacy is not protected as highly personal brain data needs to be shared (and copied across increasingly tight information governance boundaries). Moreover, BCI data are collected from different sources and are often based on different BCI tasks, which has been thought to limit their reusability. Here, we demonstrate a federated deep transfer learning technique, the Multi-dataset Federated Separate-Common-Separate Network (MF-SCSN) based on our previous work of SCSN, which integrates privacy-preserving properties into deep transfer learning to utilise data sets with different tasks. This framework trains a BCI decoder using different source data sets obtained from different imagery tasks (e.g. some data sets with hands and feet, vs others with single hands and tongue, etc). Therefore, by introducing privacy-preserving transfer learning techniques, we unlock the reusability and scalability of existing BCI data sets. We evaluated our federated transfer learning method on the NeurIPS 2021 BEETL competition BCI task. The proposed architecture outperformed the baseline decoder by 3%. Moreover, compared with the baseline and other transfer learning algorithms, our method protects the privacy of the brain data from different data centres.


ADAPT : Awesome Domain Adaptation Python Toolbox

arXiv.org Artificial Intelligence

In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public.