Logistic Regression for Stroke Prediction: An Evaluation of its Accuracy and Validity

Authors

  • Lechi Wang

DOI:

https://doi.org/10.54097/hset.v39i.6712

Keywords:

Stroke Prediction; Logistic Regression; Gradient Descent; Regularization.

Abstract

 A stroke, also known as a brain attack, is a serious medical condition that occurs when the blood supply to the brain is disrupted. It is a leading cause of death globally, accounting for about 11% of all deaths. In this paper, the author opts to use logistic regression for predicting the stroke. The paper starts with introducing the methods used to preprocess the raw dataset, including data cleaning, label encoding, oversampling, splitting the dataset and finally feature scaling. Then it goes to the modeling section. In this section, the author introduces the process of constructing a logistic regression model in detail. The first step is building the sigmoid function, a foundation of the model. Then the cost function is built to measure the difference between the model's predicted values and the true values. After that, the author constructs the gradient-computing function to determine the rate and direction at each iteration when implementing gradient descent. Based on this function, the gradient descent function is finally built, and predictions are made according to the outcome after putting testing variables into the model. By analyzing the results, the author also compares the different performances of the model between the one without regularization and the one equipped with regularization and then draw a conclusion that using regularization can help improve the performance of the model. At the end of the study, the author gets a rather satisfying result with a prediction accuracy of over 95% in the collected dataset.

Downloads

Download data is not yet available.

References

CDC. Stroke. https://www.cdc.gov/stroke/facts.html, 2022.

Meretoja, Atte, et al. Stroke doctors: who are we? A world stroke organization survey. International Journal of Stroke 12.8, 2017, 858-868.

Shinde, Pramila P., and Seema Shah. A review of machine learning and deep learning applications. 2018 Fourth international conference on computing communication control and automation (ICCUBEA). IEEE, 2018.

Libbrecht, Maxwell W., and William Stafford Noble. Machine learning applications in genetics and genomics. Nature Reviews Genetics 16.6, 2015, 321-332.

Q, Yu, et al. Improved denoising autoencoder for maritime image denoising and semantic segmentation of USV. China Communications 17.3, 2020, 46-57.

Kourou, Konstantina, et al. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal 13, 2015, 8-17.

Kayalibay, Baris, Grady Jensen, and Patrick van der Smagt. CNN-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056, 2017.

Emon, Minhaz Uddin, et al. Performance analysis of machine learning approaches in stroke prediction. 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2020.

Kaggle. Stroke prediction dataset. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset, 2020.

Wright, Raymond E. Logistic regression, 1995.

Downloads

Published

01-04-2023

How to Cite

Wang, L. (2023). Logistic Regression for Stroke Prediction: An Evaluation of its Accuracy and Validity. Highlights in Science, Engineering and Technology, 39, 1086-1092. https://doi.org/10.54097/hset.v39i.6712