Goto

Collaborating Authors

 Transfer Learning


Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

arXiv.org Artificial Intelligence

We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition, which features complex rhythmic patterns that challenge conventional models. We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task). Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments. Our results demonstrate significant improvements over baseline performance, with F1 scores reaching up to 0.998 in the intra-task setting and improvements of over 50 percentage points in best-case scenarios. The cross-task adaptation proves particularly effective for time-keeping instruments, where onsets naturally align with beat positions. The optimal fine-tuning configuration varies by instrument, highlighting the importance of instrument-specific adaptation strategies. This approach addresses the challenges of underrepresented musical traditions, offering an efficient human-in-the-loop methodology that minimizes annotation effort while maximizing performance. Our findings contribute to more inclusive music information retrieval tools applicable beyond Western musical contexts.


Transfer Learning in Infinite Width Feature Learning Networks

arXiv.org Machine Learning

We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.


Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

arXiv.org Machine Learning

Theoretical works on supervised transfer learning (STL) -- where the learner has access to labeled samples from both source and target distributions -- have for the most part focused on statistical aspects of the problem, while efficient optimization has received less attention. We consider the problem of designing an SGD procedure for STL that alternates sampling between source and target data, while maintaining statistical transfer guarantees without prior knowledge of the quality of the source data. A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step, to automatically gain from the source when it is informative, or bias towards the target and avoid negative transfer when the source is less informative. We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses, rooted in tracking an abstract sequence of constrained convex programs that serve to maintain the desired transfer guarantees. We instantiate these results in the concrete setting of linear regression with square loss, and show that the procedure converges, with $1/\sqrt{T}$ rate, to a solution whose statistical performance on the target is adaptive to the a priori unknown quality of the source. Experiments with synthetic and real datasets support the theory.


Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across Domains

arXiv.org Artificial Intelligence

Transfer learning in Reinforcement Learning (RL) enables agents to leverage knowledge from source tasks to accelerate learning in target tasks. While prior work, such as the Attend, Adapt, and Transfer (A2T) framework, addresses negative transfer and selective transfer, other critical challenges remain underexplored. This paper introduces the Generalized Adaptive Transfer Network (GATN), a deep RL architecture designed to tackle task generalization across domains, robustness to environmental changes, and computational efficiency in transfer. GATN employs a domain-agnostic representation module, a robustness-aware policy adapter, and an efficient transfer scheduler to achieve these goals. We evaluate GATN on diverse benchmarks, including Atari 2600, MuJoCo, and a custom chatbot dialogue environment, demonstrating superior performance in cross-domain generalization, resilience to dynamic environments, and reduced computational overhead compared to baselines. Our findings suggest GATN is a versatile framework for real-world RL applications, such as adaptive chatbots and robotic control.


Transfer Learning for Matrix Completion

arXiv.org Machine Learning

In this paper, we explore the knowledge transfer under the setting of matrix completion, which aims to enhance the estimation of a low-rank target matrix with auxiliary data available. We propose a transfer learning procedure given prior information on which source datasets are favorable. We study its convergence rates and prove its minimax optimality. Our analysis reveals that with the source matrices close enough to the target matrix, out method outperforms the traditional method using the single target data. In particular, we leverage the advanced sharp concentration inequalities introduced in \cite{brailovskaya2024universality} to eliminate a logarithmic factor in the convergence rate, which is crucial for proving the minimax optimality. When the relevance of source datasets is unknown, we develop an efficient detection procedure to identify informative sources and establish its selection consistency. Simulations and real data analysis are conducted to support the validity of our methodology.


Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data

arXiv.org Artificial Intelligence

This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection using wearable sensor data. The proposed approach utilizes a unified CNN-based architecture, MTL-net, which processes accelerometer data and branches into two outputs for each respective task. Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher, significantly reducing training computational overhead while maintaining performance benefits. To support this research, we developed a comprehensive accelerometer-based dataset capturing 12 distinct sleep postures across three different wearing positions, complementing two existing public datasets (MHealth and WISDM). Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios, achieving notable improvements in both human activity recognition and device placement detection tasks. This method demonstrates enhanced stability in convergence patterns during training and exhibits reduced overfitting compared to traditional multitask learning baselines. This framework contributes to the practical implementation of knowledge distillation in human activity recognition systems, offering an effective solution for multitask learning with accelerometer data that balances accuracy and training efficiency. More broadly, it reduces the computational cost of model training, which is critical for scenarios requiring frequent model updates or training on resource-constrained platforms. The code and model are available at https://github.com/Kuan2vn/smooth\_distill.


Are Fast Methods Stable in Adversarially Robust Transfer Learning?

arXiv.org Machine Learning

Transfer learning is often used to decrease the computational cost of model training, as fine-tuning a model allows a downstream task to leverage the features learned from the pre-training dataset and quickly adapt them to a new task. This is particularly useful for achieving adversarial robustness, as adversarially training models from scratch is very computationally expensive. However, high robustness in transfer learning still requires adversarial training during the fine-tuning phase, which requires up to an order of magnitude more time than standard fine-tuning. In this work, we revisit the use of the fast gradient sign method (FGSM) in robust transfer learning to improve the computational cost of adversarial fine-tuning. We surprisingly find that FGSM is much more stable in adversarial fine-tuning than when training from scratch. In particular, FGSM fine-tuning does not suffer from any issues with catastrophic overfitting at standard perturbation budgets of $\varepsilon=4$ or $\varepsilon=8$. This stability is further enhanced with parameter-efficient fine-tuning methods, where FGSM remains stable even up to $\varepsilon=32$ for linear probing. We demonstrate how this stability translates into performance across multiple datasets. Compared to fine-tuning with the more commonly used method of projected gradient descent (PGD), on average, FGSM only loses 0.39% and 1.39% test robustness for $\varepsilon=4$ and $\varepsilon=8$ while using $4\times$ less training time. Surprisingly, FGSM may not only be a significantly more efficient alternative to PGD in adversarially robust transfer learning but also a well-performing one.


Multi-Environment GLAMP: Approximate Message Passing for Transfer Learning with Applications to Lasso-based Estimators

arXiv.org Machine Learning

Approximate Message Passing (AMP) algorithms enable precise characterization of certain classes of random objects in the high-dimensional limit, and have found widespread applications in fields such as signal processing, statistics, and communications. In this work, we introduce Multi-Environment Generalized Long AMP, a novel AMP framework that applies to transfer learning problems with multiple data sources and distribution shifts. We rigorously establish state evolution for multi-environment GLAMP. We demonstrate the utility of this framework by precisely characterizing the risk of three Lasso-based transfer learning estimators for the first time: the Stacked Lasso, the Model Averaging Estimator, and the Second Step Estimator. We also demonstrate the remarkable finite sample accuracy of our theory via extensive simulations.


Predicting Onflow Parameters Using Transfer Learning for Domain and Task Adaptation

arXiv.org Artificial Intelligence

Determining onflow parameters is crucial from the perspectives of wind tunnel testing and regular flight and wind turbine operations. These parameters have traditionally been predicted via direct measurements which might lead to challenges in case of sensor faults. Alternatively, a data-driven prediction model based on surface pressure data can be used to determine these parameters. It is essential that such predictors achieve close to real-time learning as dictated by practical applications such as monitoring wind tunnel operations or learning the variations in aerodynamic performance of aerospace and wind energy systems. To overcome the challenges caused by changes in the data distribution as well as in adapting to a new prediction task, we propose a transfer learning methodology to predict the onflow parameters, specifically angle of attack and onflow speed. It requires first training a convolutional neural network (ConvNet) model offline for the core prediction task, then freezing the weights of this model except the selected layers preceding the output node, and finally executing transfer learning by retraining these layers. A demonstration of this approach is provided using steady CFD analysis data for an airfoil for i) domain adaptation where transfer learning is performed with data from a target domain having different data distribution than the source domain and ii) task adaptation where the prediction task is changed. Further exploration on the influence of noisy data, performance on an extended domain, and trade studies varying sampling sizes and architectures are provided. Results successfully demonstrate the potential of the approach for adaptation to changing data distribution, domain extension, and task update while the application for noisy data is concluded to be not as effective.


A Transfer Learning Framework for Multilayer Networks via Model Averaging

arXiv.org Machine Learning

Link prediction in multilayer networks is a key challenge in applications such as recommendation systems and protein-protein interaction prediction. While many techniques have been developed, most rely on assumptions about shared structures and require access to raw auxiliary data, limiting their practicality. To address these issues, we propose a novel transfer learning framework for multilayer networks using a bi-level model averaging method. A $K$-fold cross-validation criterion based on edges is used to automatically weight inter-layer and intra-layer candidate models. This enables the transfer of information from auxiliary layers while mitigating model uncertainty, even without prior knowledge of shared structures. Theoretically, we prove the optimality and weight convergence of our method under mild conditions. Computationally, our framework is efficient and privacy-preserving, as it avoids raw data sharing and supports parallel processing across multiple servers. Simulations show our method outperforms others in predictive accuracy and robustness. We further demonstrate its practical value through two real-world recommendation system applications.