Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis
Ramesh, Risshab Srinivas, Udupa, Roshani T S, J, Monisha, S, Kushi K K
–arXiv.org Artificial Intelligence
-- Cardiovascular diseases (CVDs) are a main cause of mortality globally, accounting for 31% of all deaths. This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records to explore the influence of numerical (age, height, weight, blood pressure, BMI) and categorical gender, cholesterol, glucose, smoking, alcohol, activity) factors on CVD occurrence. We have performed statistical analyses, including t - tests, Chi - square tests, and ANOVA, to identify strong associations between CVD and elde rly people, hypertension, higher weight, and abnormal cholesterol levels, while physical activity (a protective factor). A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associati ons for smoking and alcohol, suggesting potential data issues. Model performance comparisons reveal CatBoost as the top performer with an accuracy of 0.734 and an ECE of 0.0064 and excels in probabilistic prediction (Brier score = 0.1824). Data challenges, including outliers and skewed distributions, indicate a need for improved preprocessing to enhance predictive reliability. Cardiovascular diseases (CVDs) encompass a range of conditions affecting the heart and blood vessels, including coronary heart disease, stroke, and heart failure.
arXiv.org Artificial Intelligence
Jul-30-2025
- Country:
- Asia
- China (0.04)
- India
- Karnataka > Bengaluru (0.14)
- Maharashtra > Pune (0.04)
- Puducherry (0.04)
- Middle East > Saudi Arabia
- Riyadh Province > Riyadh (0.04)
- Europe > Switzerland
- Basel-City > Basel (0.04)
- North America > United States
- Florida > Orange County > Orlando (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (0.88)
- New Finding (0.68)
- Research Report
- Industry:
- Technology: