Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning
Azad, Fatemeh, Bosnić, Zoran, Kukar, Matjaž
–arXiv.org Artificial Intelligence
--Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. We evaluate our method on tabular data under the Missing Completely at Random (MCAR) assumption using both direct metrics, where Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are computed between imputed values and their corresponding original ground truth values in the artificially masked positions, and indirect metrics, which measure the RMSE of a target variable predicted by machine learning models trained on the imputed datasets. Across three benchmark datasets, the model achieved the lowest or near-lowest RMSE and delivered stable downstream predictive performance, even when individual imputers varied in performance.
arXiv.org Artificial Intelligence
Sep-4-2025
- Country:
- Asia > China
- Europe > Slovenia
- Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine > Therapeutic Area (0.72)
- Technology: