Hybrid Machine Learning and Deep Learning Models for Cardiovascular Disease Risk Prediction: A Comparative Analysis
Naveen Chadha1, Dr. Gurpreet Singh2, Karuna3
1Research Scholar M.Tech. CSE, 2Professor, Department of Computer Science & Engineering, St. Soldier Institute of Engineering & Technology, 3Assistant Professor, Department of Computer Science & Engineering, St. Soldier Institute of Engineering & Technology
Abstract— In recent years, cardiovascular disease (CVD) has emerged as a leading cause of mortality worldwide, highlighting the urgent need for reliable predictive models to support early diagnosis and preventive care. This study conducts a comparative analysis of various machine learning models to assess CVD risk, focusing on their ability to accurately identify individuals at high risk based on clinical and demographic data. We evaluate multiple supervised learning algorithms, including logistic regression, random forests, support vector classifiers, K-nearest neighbors, gradient boosting, and AdaBoost, comparing their predictive performance in terms of accuracy, precision, recall, and F1-score. Additionally, we propose a novel hybrid model that combines Random Forest (RF) for feature selection and Deep Neural Networks (DNN) for classification, aiming to leverage the strengths of both approaches for more accurate CVD prediction. Our findings demonstrate that ensemble models such as random forests and gradient boosting achieve superior performance, with high accuracy (0.99) and balanced precision and recall values, outperforming simpler models like logistic regression and support vector classifiers. The hybrid model further enhances prediction accuracy, achieving 92.4% accuracy, 91.7% precision, 93.0% recall, and an AUC-ROC score of 96.0%. The analysis also underscores the importance of data preprocessing techniques, including normalization and handling of missing values, in optimizing model accuracy and stability. Notably, K-nearest neighbors also performed exceptionally well with a high F1-score across classes, highlighting its robustness for this task. This study provides a detailed examination of each model's strengths and limitations, including the proposed hybrid model, offering valuable insights for healthcare practitioners and data scientists in selecting effective machine learning models for CVD risk prediction. By integrating these models into healthcare systems, real-time risk prediction can be enhanced, ultimately supporting clinical decision-making and advancing personalized care in cardiovascular health.
Keywords— cardiovascular disease, machine learning, risk assessment, predictive models, supervised learning, healthcare