Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
Liu, Jialei, Liao, Jun, Fang, Kuangnan
Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.
Nov-17-2025
- Country:
- Asia > China (0.68)
- North America > United States (0.46)
- Genre:
- Research Report > New Finding (0.88)
- Industry:
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.68)
- Technology: