Multimodal Heart Disease Prediction System Using Tabular and Image Data
Koustav Podder1, Shubhradip Saha2, Sudipta Kumar Dutta3
1,2,3 BP Poddar Institute of Management and Technology, Kolkata, India
Abstract– Cardiovascular diseases remain one of the leading causes of mortality worldwide, making early and reliable risk prediction a critical requirement in modern healthcare systems. Traditional diagnostic approaches often rely on either clinical parameters or medical imaging in isolation, which may limit the predictiveness of the disease. This project aims to build a multimodal heart disease prediction framework that integrates clinical tabular data and echocardiography-based image information to estimate the probability of heart disease in a robust way. The tabular dataset includes demographics and clinical risk factors such as age, sex, chest pain type, blood pressure, cholesterol, electrocardiographic findings, and exercise-related parameters. Multiple classical machine learning models were trained and evaluated, including Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, and XGBoost. Among all these, XGBoost achieved the best performance, with a test accuracy of 89.13% and an F1-score of 0.9029, demonstrating strong predictive capability. For the imaging modality, an echocardiography video-based model was trained using the EchoNet Dynamic dataset, where cardiac abnormality was determined based on an ejection fraction threshold of 50%. The image model achieved a ROC-AUC of 0.915, indicating excellent discriminative performance. Threshold optimization was performed to improve clinical sensitivity, increasing recall from 66.77% to 72.70% and reducing false negative cases. A late fusion approach using weighted averaging of predicted probabilities was adopted to combine both modalities, leveraging their complementary strengths. The system outputs a probabilistic risk score rather than a definitive diagnosis, making it suitable as a clinical decision support tool. Future enhancements include the integration of a RAG-based assistant for user guidance and the adoption of explainable AI techniques to improve transparency and interpretability.
Key Words: Heart Disease Prediction, Multimodal Learning, Echocardiography, Machine Learning, Deep Learning, Late Fusion, Clinical Decision Support, Prediction Threshold