Hybrid XGBoost–Random Forest Ensemble Model for Early Prediction of Type-2 Diabetes Using Multimodal Clinical and Lifestyle Data
Mrs.R. Sumathi1,
Assistant Professor,
Department of Computer Science with Cognitive Systems and AIML,
Hindusthan College of Arts & Science, Coimbatore.
Ms.E. Kavi Priya2,
Assistant Professor,
Department of Computer Science with Cognitive Systems and AIML,
Hindusthan College of Arts & Science, Coimbatore.
Abstract
Type-2 Diabetes Mellitus (T2DM) is a rapidly escalating global health challenge, often diagnosed only after clinical symptoms appear, leading to delayed intervention and higher risk of complications. Early prediction using automated computational approaches can significantly improve disease prognosis and reduce healthcare costs. This research proposes a machine learning-driven predictive framework for early detection of Type-2 diabetes by integrating multimodal data, including clinical parameters (such as glucose level, blood pressure, insulin, BMI), demographic attributes, and lifestyle indicators (dietary habits, physical activity, stress level, and sleep patterns). The dataset underwent preprocessing techniques such as normalization, missing value imputation, correlation-based feature selection, and class imbalance handling using SMOTE. Multiple machine learning algorithms—including Logistic Regression, Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting (XGBoost)—were trained and evaluated to identify the best-performing model. Model performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The XGBoost model achieved superior predictive accuracy and demonstrated strong generalization capability across test samples. Furthermore, explainable AI (XAI) techniques such as SHAP values were employed to interpret feature importance and enhance clinical transparency. Results indicate that lifestyle factors combined with clinical metrics significantly improve predictive performance compared to clinical data alone. The proposed framework shows potential for integration into digital health platforms and preventive screening systems, aiding clinicians in early risk stratification and personalized intervention.
Keywords
Type-2 Diabetes Prediction, Machine Learning, Multimodal Healthcare Data, XGBoost, Predictive Analytics, Lifestyle Factors, Explainable AI (XAI), Early Diagnosis.