Smart Diagnosis: Enhancing Disease Prediction Accuracy with Hybrid Machine Learning Models
Prof. Yogesh Handge
Dept. of Computer Engineering Pune Institute of Computer Technology
Pune, India yahandge@pict.edu
Sarthak Dhaytonde
Dept. of Computer Engineering Pune Institute of Computer Technology
Pune, India sarthakdhaytonde014@gmail.com
Ajit Kale
Dept. of Computer Engineering Pune Institute of Computer Technology
Pune, India ajitkale2406@gmail.com
Soham Labba
Dept. of Computer Engineering Pune Institute of Computer Technology
Pune, India labbasoham18@gmail.com
Kartik Kasrewar
Dept. of Computer Engineering Pune Institute of Computer Technology
Pune, India kasrewarkartik.0709@gmail.com
Abstract—Disease prediction in healthcare involves evaluating the likelihood of a patient’s condition by analyzing their symp toms. Accurate and early prediction of diseases can significantly improve treatment efficacy, optimize patient care, and reduce healthcare costs. While prior research has employed machine learning models such as Support Vector Machines (SVM), K Nearest Neighbors (KNN), and RUSBoost for symptom-based dis ease detection, these approaches often face limitations, including suboptimal accuracy, reliance on unprocessed data, and a narrow focus on symptom analysis. To overcome these challenges, this research introduces a novel hybrid machine learning framework that enhances accuracy and reliability in disease prediction. The proposed model utilizes a curated medical dataset from Kaggle, preprocessed by assigning symptom weights based on their clinical significance and rarity. The framework integrates Decision Trees, K-Fold Cross-Validation, Multinomial Logistic Regression, and Gradient Boosting (GB) algorithms. Decision Trees are employed for interpretable feature selection, K-Fold Cross-Validation ensures robust model evaluation, Multinomial Logistic Regression handles multi-class classification, and Gradi ent Boosting enhances predictive performance through ensemble learning. Experimental results demonstrate that the proposed model achieves superior accuracy, precision, and recall compared to existing state-of-the-art methods. This research advances the f ield of automated healthcare systems by providing a reliable tool for early disease prediction and personalized treatment planning. The proposed framework has the potential to transform health care delivery by enabling timely interventions and improving patient outcomes.