AITopics | Transfer Learning

Collaborating Authors

Transfer Learning

Transfer Learning is the reuse of a pre-trained model on a new problem. (Towards Data Science)

News Overviews Instructional Materials AI-Alerts Classics

The (In)Effectiveness of Intermediate Task Training For Domain Adaptation and Cross-Lingual Transfer Learning

arXiv.org Artificial IntelligenceNov-4-2022

Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable knowledge-based fine-tuning for a number of tasks, adaptation of models for different domains and even languages. However, it remains an open question, if and when transfer learning will work, i.e. leading to positive or negative transfer. In this paper, we analyze the knowledge transfer across three natural language processing (NLP) tasks - text classification, sentimental analysis, and sentence similarity, using three LLMs - BERT, RoBERTa, and XLNet - and analyzing their performance, by fine-tuning on target datasets for domain and cross-lingual adaptation tasks, with and without an intermediate task training on a larger dataset. Our experiments showed that fine-tuning without an intermediate task training can lead to a better performance for most tasks, while more generalized tasks might necessitate a preceding intermediate task training step. We hope that this work will act as a guide on transfer learning to NLP practitioners.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.01091

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.70)

Industry: Media (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

Shu, Yang, Cao, Zhangjie, Zhang, Ziyang, Wang, Jianmin, Long, Mingsheng

arXiv.org Artificial IntelligenceNov-4-2022

Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.

artificial intelligence, machine learning, pre-trained model, (16 more...)

arXiv.org Artificial Intelligence

2206.03726

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transfer Learning with Synthetic Corpora for Spatial Role Labeling and Reasoning

Mirzaee, Roshanak, Kordjamshidi, Parisa

arXiv.org Artificial IntelligenceNov-3-2022

Recent research shows synthetic data as a source of supervision helps pretrained language models (PLM) transfer learning to new target tasks/domains. However, this idea is less explored for spatial language. We provide two new data resources on multiple spatial language processing tasks. The first dataset is synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL). Compared to previous SQA datasets, we include a larger variety of spatial relation types and spatial expressions. Our data generation process is easily extendable with new spatial expression lexicons. The second one is a real-world SQA dataset with human-generated questions built on an existing corpus with SPRL annotations. This dataset can be used to evaluate spatial language processing models in realistic situations. We show pretraining with automatically generated data significantly improves the SOTA results on several SQA and SPRL benchmarks, particularly when the training data in the target domain is small.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.16952

Country:

North America > United States > Michigan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.83)

Add feedback

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

Bigoulaeva, Irina, Sachdeva, Rachneet, Madabushi, Harish Tayyar, Villavicencio, Aline, Gurevych, Iryna

arXiv.org Artificial IntelligenceOct-31-2022

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other. We test these models on the FigLang2022 shared task which requires participants to predict language inference labels on figurative language along with corresponding textual explanations of the inference predictions. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting. Our findings show that simple sequential fine-tuning of text-to-text models is an extraordinarily powerful method for cross-task knowledge transfer while simultaneously predicting multiple interdependent targets. So much so, that our best model achieved the (tied) highest score on the task.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.17301

Country:

South America > Colombia > Meta Department > Villavicencio (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.51)

Add feedback

Transfer Learning with Kernel Methods

Radhakrishnan, Adityanarayanan, Luyten, Max Ruiz, Prasad, Neha, Uhler, Caroline

arXiv.org Artificial IntelligenceOct-31-2022

Transfer learning refers to the machine learning problem of utilizing knowledge from a source task to improve performance on a target task. Recent approaches to transfer learning have achieved tremendous empirical success in many applications including in computer vision [17, 45], natural language processing [16, 40, 43], and the biomedical field [15, 19]. Since transfer learning approaches generally rely on complex deep neural networks, it can be difficult to characterize when and why they work [44]. Kernel methods [46] are conceptually and computationally simple machine learning models that have been found to be competitive with neural networks on a variety of tasks including image classification [3, 29, 42] and drug screening [42]. Their simplicity stems from the fact that training a kernel method involves performing linear regression after transforming the data. There has been renewed interest in kernels due to a recently established equivalence between wide neural networks and kernel methods [2, 25], which has led to the development of modern, neural tangent kernels (NTKs) that are competitive with neural networks. Given their simplicity and effectiveness, kernel methods could provide a powerful approach for transfer learning and also help characterize when transfer learning between a source and target task would be beneficial. However, developing an algorithm for transfer learning with kernel methods for general source and target tasks has been an open problem. In particular, while there is a standard transfer learning approach for neural networks that involves replacing and re-training the last layer of a pre-trained network, there is no known corresponding operation for kernels.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Artificial Intelligence

2211.00227

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Don't Waste Data: Transfer Learning to Leverage All Data for Machine-Learnt Climate Model Emulation

Parthipan, Raghul, Wischik, Damon J.

arXiv.org Artificial IntelligenceOct-30-2022

How can we learn from all available data when training machine-learnt climate models, without incurring any extra cost at simulation time? Typically, the training data comprises coarse-grained high-resolution data. But only keeping this coarse-grained data means the rest of the high-resolution data is thrown out. We use a transfer learning approach, which can be applied to a range of machine learning models, to leverage all the high-resolution data. We use three chaotic systems to show it stabilises training, gives improved generalisation performance and results in better forecasting skill.

artificial intelligence, high-resolution data, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.04001

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Accurate Bundle Matching and Generation via Multitask Learning with Partially Shared Parameters

Jeon, Hyunsik, Jang, Jun-Gi, Kim, Taehun, Kang, U

arXiv.org Artificial IntelligenceOct-28-2022

How can we recommend existing bundles to users accurately? How can we generate new tailored bundles for users? Recommending a bundle, or a group of various items, has attracted widespread attention in e-commerce owing to the increased satisfaction of both users and providers. Bundle matching and bundle generation are two representative tasks in bundle recommendation. The bundle matching task is to correctly match existing bundles to users while the bundle generation is to generate new bundles that users would prefer. Although many recent works have developed bundle recommendation models, they fail to achieve high accuracy since they do not handle heterogeneous data effectively and do not learn a method for customized bundle generation. In this paper, we propose BundleMage, an accurate approach for bundle matching and generation. BundleMage effectively mixes user preferences of items and bundles using an adaptive gate technique to achieve high accuracy for the bundle matching. BundleMage also generates a personalized bundle by learning a generation module that exploits a user preference and the characteristic of a given incomplete bundle to be completed. BundleMage further improves its performance using multi-task learning with partially shared parameters. Through extensive experiments, we show that BundleMage achieves up to 6.6% higher nDCG in bundle matching and 6.3x higher nDCG in bundle generation than the best competitors. We also provide qualitative analysis that BundleMage effectively generates bundles considering both the tastes of users and the characteristics of target bundles.

artificial intelligence, bundle, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0280630

2210.1546

Country: Asia > South Korea > Seoul > Seoul (0.05)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

TAD: Transfer Learning-based Multi-Adversarial Detection of Evasion Attacks against Network Intrusion Detection Systems

Debicha, Islam, Bauwens, Richard, Debatty, Thibault, Dricot, Jean-Michel, Kenaza, Tayeb, Mees, Wim

arXiv.org Artificial IntelligenceOct-27-2022

Nowadays, intrusion detection systems based on deep learning deliver state-of-the-art performance. However, recent research has shown that specially crafted perturbations, called adversarial examples, are capable of significantly reducing the performance of these intrusion detection systems. The objective of this paper is to design an efficient transfer learning-based adversarial detector and then to assess the effectiveness of using multiple strategically placed adversarial detectors compared to a single adversarial detector for intrusion detection systems. In our experiments, we implement existing state-of-the-art models for intrusion detection. We then attack those models with a set of chosen evasion attacks. In an attempt to detect those adversarial attacks, we design and implement multiple transfer learning-based adversarial detectors, each receiving a subset of the information passed through the IDS. By combining their respective decisions, we illustrate that combining multiple detectors can further improve the detectability of adversarial traffic compared to a single detector in the case of a parallel IDS design.

artificial intelligence, detector, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.future.2022.08.011

2210.157

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(16 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.81)

Add feedback

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Guo, Haohan, Xie, Fenglong, Wu, Xixin, Lu, Hui, Meng, Helen

arXiv.org Artificial IntelligenceOct-26-2022

This paper aims to enhance low-resource TTS by reducing training data requirements using compact speech representations. A Multi-Stage Multi-Codebook (MSMC) VQ-GAN is trained to learn the representation, MSMCR, and decode it to waveforms. Subsequently, we train the multi-stage predictor to predict MSMCRs from the text for TTS synthesis. Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages. It selects audio from other languages using speaker similarity metric to augment the training set, and applies transfer learning to improve training quality. In MOS tests, the proposed system significantly outperforms FastSpeech and VITS in standard and low-resource scenarios, showing lower data requirements. The proposed training strategy effectively enhances MSMCRs on waveform reconstruction. It improves TTS performance further, which wins 77% votes in the preference test for the low-resource TTS with only 15 minutes of paired data.

artificial intelligence, machine learning, utterance, (19 more...)

arXiv.org Artificial Intelligence

2210.15131

Country:

Asia > China > Hong Kong (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.36)

Add feedback

Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention

Chen, Yifan, Hazarika, Devamanyu, Namazifar, Mahdi, Liu, Yang, Jin, Di, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceOct-26-2022

The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the \textit{kernel lens}. Motivated by the connection between self-attention in transformer-based PLMs and kernel learning, we propose \textit{kernel-wise adapters}, namely \textit{Kernel-mix}, that utilize the kernel structure in self-attention to guide the assignment of the tunable parameters. These adapters use guidelines found in classical kernel learning and enable separate parameter tuning for each attention head. Our empirical results, over a diverse set of natural language generation and understanding tasks, show that our proposed adapters can attain or improve the strong performance of existing baselines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2205.0372

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)

Add feedback