Stroke Prediction Using Linear Regression
Nahala M A1, Sooraj Subhash2, Kishore Xavier3 , Rahul Manoj4 Sreehari V V5
1Asst. Prof , Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
nahalama@sngce.ac.in
2Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
soorajsubhash369@gmail.com
3 Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
kishorexavier69@gmail.com
4 Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
rahulmanoj2002@gmail.com
5 Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
sreeharivinod666@gmail.com
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Stroke is one of the leading causes of death and disability worldwide, and early prediction can significantly improve patient outcomes through timely interventions. This study explores the potential of using linear regression models to predict the likelihood of stroke in individuals based on a set of clinical and demographic factors. Data used came from a publicly available stroke dataset; the features used include age, gender, hypertension, heart disease, marital status, type of work, smoking habits, among others. The goal of this study is to find some important predictors and then establish a linear regression model which is capable of approximating stroke risk with reasonable accuracy.
Hence, feature selection and preprocessing aided the choice of relevant variables with which to build the predicting model. The subset formed by training and testing will be used to analyze a range of metrics, such as the mean squared error and the value of R-squared to reflect performance. The outcomes do indeed show that using relevant features for linear regression results can indeed be used for predictions related to stroke risks: thereby resulting in a simple but readable early risk identification model. More, however, the accuracy found of the model suggests that other algorithmic and data needs could allow for increased reliability in this field. The paper concludes that with linear regression, there seems a viable foundation to predict stroke while suggesting further refinement and more refined models would be necessary in clinical applications.
Key Words: Stroke prediction, linear regression, feature selection, data preprocessing, machine learning.