Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest
Dolgopolyi, Roman, Amaslidou, Ioanna, Margaritou, Agrippina
–arXiv.org Artificial Intelligence
Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. Thi s study evaluates three machine learning models--Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF), using a real -world da-taset drawn from World Health Organization (WHO) and United N ations (UN) sources. After extensive preprocessing to address missing v alues and inconsistencies, each model's performance was assessed with R, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show tha t RF achieves the highest predictive accuracy (R = 0.9423), significantly outperforming LR and RDT. Interpretability was prioritized through p -values for LR and feature - importance metrics for the tree -based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the synergy between ensemble methods and transparency in addressing public -health challenges. Future research should explore advanced imputation strategies, alternative algorithms (e.g., neural networks), and updated data to further refine predictive accuracy and support evidence-based policymaking in global health contexts.
arXiv.org Artificial Intelligence
Oct-2-2025