Stroke Prediction Base on Logistic Regression Model

Authors

  • Le Li

DOI:

https://doi.org/10.54097/cx2f3j88

Keywords:

Stroke prediction, random forest, logistic regression, LASSO, XGBoost, Machine learning.

Abstract

The prediction of stroke using risk factors and basic demographic information can be valuable for primary prevention and community healthcare workers. Machine learning models are becoming more prevalent in clinical prediction due to their high accuracy. This study investigates stroke prediction using four machine learning models including logistic regression, random forest, Lasso Regression (Least Absolute Shrinkage and Selection Operator), and Extreme Gradient Boosting (XGBoost). Six key variables—age, gender, work type, heart disease history, marital status, residency type, smoking status, Body Mass Index (BMI), and average glucose level—are used for prediction. Among the dataset of 3,255 observations, 180 individuals experienced stroke, indicating an extremely imbalanced dataset. Therefore, balanced accuracy is the key metric used to compare model performance. The balanced accuracy for logistic regression, Lasso Regression, random forest, and XGBoost are 78.4%, 61.7%, 50%, and 67.7%, respectively. Logistic regression demonstrated the strongest performance, while also highlighting the significant role of age, particularly for individuals over 45.

Downloads

Download data is not yet available.

References

[1] Centers for Disease Control and Prevention. (n.d.). 2024-10-1. Stroke facts. Centers for Disease Control and Prevention. https://www.cdc.gov/stroke/data-research/facts-stats/index.html.

[2] Goldstein, L. B., Adams, R., Alberts, M. J., Appel, L. J., Brass, L. M., Bushnell, C. D., Culebras, A., DeGraba, T. J., Gorelick, P. B., Guyton, J. R., Hart, R. G., Howard, G., Kelly-Hayes, M., Nixon, J. V., Sacco, R. L., American Heart Association, & American Stroke Association Stroke Council. Primary prevention of ischemic stroke: a guideline from the American Heart Association/American Stroke Association Stroke Council: cosponsored by the Atherosclerotic Peripheral Vascular Disease Interdisciplinary Working Group; Cardiovascular Nursing Council; Clinical Cardiology Council; Nutrition, Physical Activity, and Metabolism Council; and the Quality of Care and Outcomes Research Interdisciplinary Working Group. Circulation, 2016, 113 (24), e873 – e923.

[3] Liu, Y., Yin, B., & Cong, Y. The probability of ischemic stroke prediction with a multi-neural-network model. Sensors (Basel, Switzerland), 2020, 20 (17), 4995.

[4] Ntaios, G., Faouzi, M., Ferrari, J., Lang, W., Vemmos, K., & Michel, P. An integer-based score to predict functional outcome in acute ischemic stroke: the ASTRAL score. Neurology, 2012, 78 (24), 1916 – 1922.

[5] Heo, J., Yoon, J. G., Park, H., Kim, Y. D., Nam, H. S., & Heo, J. H. Machine learning-based model for prediction of outcomes in acute stroke. Stroke, 2019, 50 (5), 1263 – 1265.

[6] Stroke Prediction Dataset. 2020. fedesoriano. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data.

[7] Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of clinical epidemiology, 2019, 110, 12 – 22.

Downloads

Published

24-12-2024

How to Cite

Li, L. (2024). Stroke Prediction Base on Logistic Regression Model. Highlights in Science, Engineering and Technology, 123, 574-578. https://doi.org/10.54097/cx2f3j88