Cross-Learning from Scarce Data via Multi-Task Constrained Optimization

Agorio, Leopoldo, Cerviño, Juan, Calvo-Fullana, Miguel, Ribeiro, Alejandro, Bazerque, Juan Andrés

arXiv.org Artificial Intelligence 

Abstract--A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task cross-learning framework to overcome data scarcity by jointly estimating deterministic parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases. The machine learning problem, in general, involves extracting information from a dataset, which is typically achieved by fitting the parameters of a model [1], whether it be a neural network or a more specific parametric function that incorporates additional knowledge about the data source. Once fitted, this parametric model can be used for classification, prediction, or estimation, serving various purposes.