GlycoDetect a Diabetic Prediction Model using ML
Archana Nikose1, Harsh Kuite2, Kalyani Mude2, Aniruddha Polke2, Nikita Nanhe2
1Assistant Professor, Department of CSE, Priyadarshini Bhagwati College of Engineering
2Student, Department of CSE, Priyadarshini Bhagwati College of Engineering
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This project focuses on using machine learning to predict a patient’s risk of developing diabetes based on a test. This database, compiled by the National Institute of Diabetes and Digestive and Kidney Diseases, contains health indicators for Pima Indian patients. The project involves several key steps: initial data, selecting a feature, selecting a model, updating the hyperparameter, and deploying it via the Flask web application. In the data preprocessing phase, feature scaling and normalization are used to standardize the dataset, while missing values and outliers are handled to ensure data integrity. Feature selection uses correlation matrix and recursive feature elimination (RFE) to reduce dimensionality and improve model efficiency. To ensure the model is optimized for latent data, the dataset is split into two parts: 66% for training and 34% for testing. Various machine learning algorithms are evaluated, including logistic regression, naive Bayes, K-nearest neighbours, decision trees, and support vector classifiers. Logistic regression was selected as the final model due to its accuracy on the test data (80.53%). The model uses grid search for hyperparameter tuning to improve its performance. The training model is embedded in the Flask web application, allowing users to access health metrics and get real-time estimates of blood pressure. The system is designed to be user-friendly and scalable, providing a practical tool for early diagnosis of diabetes. All methods ensure that the model is accurate, reliable, and capable of making real-world predictions.
Keywords: prediction, diabetes, glucoses, insulin, machine learning, logistic regression, naive bayes, k-nearest neighbours, decision tree, support vector classifier.