AITopics | Transfer Learning

Collaborating Authors

Transfer Learning

Transfer Learning is the reuse of a pre-trained model on a new problem. (Towards Data Science)

News Overviews Instructional Materials AI-Alerts Classics

FonMTL: Towards Multitask Learning for the Fon Language

Dossou, Bonaventure F. P., Houndayi, Iffanice, Zantou, Pamely, Hacheme, Gilles

arXiv.org Artificial IntelligenceSep-11-2023

The Fon language, spoken by an average 2 million of people, is a truly low-resourced African language, with a limited online presence, and existing datasets (just to name but a few). Multitask learning is a learning paradigm that aims to improve the generalization capacity of a model by sharing knowledge across different but related tasks: this could be prevalent in very data-scarce scenarios. In this paper, we present the first explorative approach to multitask learning, for model capabilities enhancement in Natural Language Processing for the Fon language. Specifically, we explore the tasks of Named Entity Recognition (NER) and Part of Speech Tagging (POS) for Fon. We leverage two language model heads as encoders to build shared representations for the inputs, and we use linear layers blocks for classification relative to each task. Our results on the NER and POS tasks for Fon, show competitive (or better) performances compared to several multilingual pretrained language models finetuned on single tasks. Additionally, we perform a few ablation studies to leverage the efficiency of two different loss combination strategies and find out that the equal loss weighting approach works best in our case. Our code is open-sourced at https://github.com/bonaventuredossou/multitask_fon.

mtl weighted, representation, single task, (14 more...)

arXiv.org Artificial Intelligence

2308.1428

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
Asia > Indonesia > Bali (0.05)
Africa > Benin (0.05)
(4 more...)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

Yang, Xuwei, Kratsios, Anastasis, Krach, Florian, Grasselli, Matheus, Lucchi, Aurelien

arXiv.org Artificial IntelligenceSep-8-2023

We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_{\theta}$. Our objective is to minimize the cumulative deviation of the generated parameters $\{\theta_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $\theta^\star_{1},\ldots,\theta^\star_N$ obtained for each dataset, while respecting the loss function for the model $f_{\theta(T)}$ produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model $f_{\theta}$ is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, where $\bar{N}$ is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.

algorithm, complexity, dataset, (14 more...)

arXiv.org Artificial Intelligence

2309.04557

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)

Add feedback

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

Lu, Chengjie, Xu, Qinghua, Yue, Tao, Ali, Shaukat, Schwitalla, Thomas, Nygård, Jan F.

arXiv.org Artificial IntelligenceSep-6-2023

The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.

active transfer learning, automated cancer registry system, evolving cyber-cyber digital twin, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3611643.3613897

2309.03246

Country: Europe > Norway (0.44)

Genre: Research Report (0.69)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.60)

Add feedback

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

Liu, Jihao, Wang, Tai, Liu, Boxiao, Zhang, Qihang, Liu, Yu, Li, Hongsheng

arXiv.org Artificial IntelligenceAug-28-2023

Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. GeoMIM is a multi-camera vision transformer with Cross-View Attention (CVA) blocks that uses LiDAR BEV features encoded by the pretrained BEV model as learning targets. During pretraining, GeoMIM's decoder has a semantic branch completing dense perspective-view features and the other geometry branch reconstructing dense perspective-view depth maps. The depth branch is designed to be camera-aware by inputting the camera's parameters for better transfer capability. Extensive results demonstrate that GeoMIM outperforms existing methods on nuScenes benchmark, achieving state-of-the-art performance for camera-based 3D object detection and 3D segmentation. Code and pretrained models are available at https://github.com/Sense-X/GeoMIM.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.11325

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.34)

Add feedback

Boosting Multitask Learning on Graphs through Higher-Order Task Affinities

Li, Dongyue, Ju, Haotian, Sharma, Aneesh, Zhang, Hongyang R.

arXiv.org Artificial IntelligenceAug-26-2023

Predicting node labels on a given graph is a widely studied problem with many applications, including community detection and molecular graph prediction. This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective. For a concrete example, consider overlapping community detection: each community membership is a binary node classification task. Due to complex overlapping patterns, we find that negative transfer is prevalent when we apply naive multitask learning to multiple community detection, as task relationships are highly nonlinear across different node labeling. To address the challenge, we develop an algorithm to cluster tasks into groups based on a higher-order task affinity measure. We then fit a multitask model on each task group, resulting in a boosting procedure on top of the baseline model. We estimate the higher-order task affinity measure between two tasks as the prediction loss of one task in the presence of another task and a random subset of other tasks. Then, we use spectral clustering on the affinity score matrix to identify task grouping. We design several speedup techniques to compute the higher-order affinity scores efficiently and show that they can predict negative transfers more accurately than pairwise task affinities. We validate our procedure using various community detection and molecular graph prediction data sets, showing favorable results compared with existing methods. Lastly, we provide a theoretical analysis to show that under a planted block model of tasks on graphs, our affinity scores can provably separate tasks into groups.

affinity score, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599265

2306.14009

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

Dar, Yehuda, LeJeune, Daniel, Baraniuk, Richard G.

arXiv.org Artificial IntelligenceAug-23-2023

We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear regression optimization with a regularization on the distance between the to-be-learned target parameters and the already-learned source parameters. We analytically characterize the generalization performance of our transfer learning approach and demonstrate its ability to resolve the peak in generalization errors in double descent phenomena of the minimum L2-norm solution to linear regression. Moreover, we show that for sufficiently related tasks, the optimally tuned transfer learning approach can outperform the optimally tuned ridge regression method, even when the true parameter vector conforms to an isotropic Gaussian prior distribution. Namely, we demonstrate that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task. Our results emphasize the ability of transfer learning to extend the solution space to the target task and, by that, to have an improved MMSE solution. We formulate the linear MMSE solution to our transfer learning setting and point out its key differences from the common design philosophy to transfer learning.

artificial intelligence, machine learning, target task, (16 more...)

arXiv.org Artificial Intelligence

2103.05621

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Improving automatic endoscopic stone recognition using a multi-view fusion approach enhanced with two-step transfer learning

Lopez-Tiro, Francisco, Villalvazo-Avila, Elias, Betancur-Rengifo, Juan Pablo, Reyes-Amezcua, Ivan, Hubert, Jacques, Ochoa-Ruiz, Gilberto, Daul, Christian

arXiv.org Artificial IntelligenceAug-22-2023

This contribution presents a deep-learning method for extracting and fusing image information acquired from different viewpoints, with the aim to produce more discriminant object features for the identification of the type of kidney stones seen in endoscopic images. The model was further improved with a two-step transfer learning approach and by attention blocks to refine the learned feature maps. Deep feature fusion strategies improved the results of single view extraction backbone models by more than 6% in terms of accuracy of the kidney stones classification.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.03193

Country:

North America > United States (0.14)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
North America > Mexico > Jalisco > Guadalajara (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Urology (0.76)
Health & Medicine > Therapeutic Area > Nephrology (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)

Add feedback

Disposable Transfer Learning for Selective Source Task Unlearning

Koh, Seunghee, Shon, Hyounguk, Lee, Janghyeon, Hong, Hyeong Gwon, Kim, Junmo

arXiv.org Artificial IntelligenceAug-19-2023

Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation. Even after the pre-trained model is adapted for the target task, the representation performance of the feature extractor is retained to some extent. As the performance of the pre-trained model can be considered the private property of the owner, it is natural to seek the exclusive right of the generalized performance of the pre-trained weight. To address this issue, we suggest a new paradigm of transfer learning called disposable transfer learning (DTL), which disposes of only the source task without degrading the performance of the target task. To achieve knowledge disposal, we propose a novel loss named Gradient Collision loss (GC loss). GC loss selectively unlearns the source knowledge by leading the gradient vectors of mini-batches in different directions. Whether the model successfully unlearns the source task is measured by piggyback learning accuracy (PL accuracy). PL accuracy estimates the vulnerability of knowledge leakage by retraining the scrubbed model on a subset of source data or new downstream data. We demonstrate that GC loss is an effective approach to the DTL problem by showing that the model trained with GC loss retains the performance on the target task with a significantly reduced PL accuracy.

accuracy, knowledge, pl accuracy, (16 more...)

arXiv.org Artificial Intelligence

2308.09971

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > South Korea (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Africa > Ethiopia (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning

Li, Xingyu, Tang, Bo, Li, Haifeng

arXiv.org Artificial IntelligenceAug-19-2023

Continual lifelong learning is an machine learning framework inspired by human learning, where learners are trained to continuously acquire new knowledge in a sequential manner. However, the non-stationary nature of streaming training data poses a significant challenge known as catastrophic forgetting, which refers to the rapid forgetting of previously learned knowledge when new tasks are introduced. While some approaches, such as experience replay (ER), have been proposed to mitigate this issue, their performance remains limited, particularly in the class-incremental scenario which is considered natural and highly challenging. In this paper, we present a novel algorithm, called adaptive-experience replay (AdaER), to address the challenge of continual lifelong learning. AdaER consists of two stages: memory replay and memory update. In the memory replay stage, AdaER introduces a contextually-cued memory recall (C-CMR) strategy, which selectively replays memories that are most conflicting with the current input data in terms of both data and task. Additionally, AdaER incorporates an entropy-balanced reservoir sampling (E-BRS) strategy to enhance the performance of the memory buffer by maximizing information entropy. To evaluate the effectiveness of AdaER, we conduct experiments on established supervised continual lifelong learning benchmarks, specifically focusing on class-incremental learning scenarios. The results demonstrate that AdaER outperforms existing continual lifelong learning baselines, highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance.

artificial intelligence, learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.0381

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
(8 more...)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.48)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.66)

Add feedback

Multi-task Representation Learning with Stochastic Linear Bandits

Cella, Leonardo, Lounici, Karim, Pacreau, Grégoire, Pontil, Massimiliano

arXiv.org Artificial IntelligenceAug-15-2023

We study the problem of transfer-learning in the setting of stochastic linear bandit tasks. We consider that a low dimensional linear representation is shared across the tasks, and study the benefit of learning this representation in the multi-task learning setting. Following recent results to design stochastic bandit policies, we propose an efficient greedy policy based on trace norm regularization. It implicitly learns a low dimensional representation by encouraging the matrix formed by the task regression vectors to be of low rank. Unlike previous work in the literature, our policy does not need to know the rank of the underlying matrix. We derive an upper bound on the multi-task regret of our policy, which is, up to logarithmic factors, of order $\sqrt{NdT(T+d)r}$, where $T$ is the number of tasks, $r$ the rank, $d$ the number of variables and $N$ the number of rounds per task. We show the benefit of our strategy compared to the baseline $Td\sqrt{N}$ obtained by solving each task independently. We also provide a lower bound to the multi-task regret. Finally, we corroborate our theoretical findings with preliminary experiments on synthetic data.

artificial intelligence, machine learning, matrix, (12 more...)

arXiv.org Artificial Intelligence

2202.10066

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.48)

Add feedback