Comprehensive Data Analysis and Machine Learning for Cardiovascular Disease Prediction
K. Satish Babu1, G. Navya Padma Sri2, V. Yaswanth3, V. Shiva Sai Ram4
1 Sr. Asst. Professor, Dept of Electronics and Communication Engineering, Geethanjali College of Engineering and Technology, Telangana, India
2,3,4 Students, Dept of Electronics and Communication Engineering, Geethanjali College of Engineering and Technology, Telangana, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Cardiovascular diseases (CVDs) are a leading cause of mortality worldwide, necessitating the development of intelligent, early, and accurate risk detection systems. This project presents a data-driven machine learning approach to predict CVDs by leveraging diverse cardiovascular health datasets and evaluating multiple classifiers, including Decision Trees, Random Forests, and Support Vector Machines (SVM). The methodology involves rigorous data preprocessing to handle missing values, outliers, and normalization, followed by feature selection to identify significant clinical indicators, such as BMI. Model performance is assessed using accuracy, precision, recall, and F1-score, with hyperparameter tuning via GridSearchCV to enhance efficiency. Interpretability is emphasized using techniques like feature importance and SHAP values to ensure healthcare professionals can understand model predictions. The Random Forest model demonstrated the highest accuracy of 89%, making it the most effective in this context. This system contributes to early identification of at-risk individuals, enabling proactive healthcare strategies and personalized interventions. By combining clinical relevance with a robust algorithmic framework, this project bridges data science and medical insight, supporting early diagnosis and improving outcomes in cardiovascular disease management.
Key Words: Cardiovascular Disease Prediction, Machine Learning, Data Analysis, Decision Trees, Random Forest, Support Vector Machines, Predictive Modeling, Interpretability, Early Detection, Proactive Healthcare.