Stroke Prediction Base on Logistic Regression Model
DOI:
https://doi.org/10.54097/cx2f3j88Keywords:
Stroke prediction, random forest, logistic regression, LASSO, XGBoost, Machine learning.Abstract
The prediction of stroke using risk factors and basic demographic information can be valuable for primary prevention and community healthcare workers. Machine learning models are becoming more prevalent in clinical prediction due to their high accuracy. This study investigates stroke prediction using four machine learning models including logistic regression, random forest, Lasso Regression (Least Absolute Shrinkage and Selection Operator), and Extreme Gradient Boosting (XGBoost). Six key variables—age, gender, work type, heart disease history, marital status, residency type, smoking status, Body Mass Index (BMI), and average glucose level—are used for prediction. Among the dataset of 3,255 observations, 180 individuals experienced stroke, indicating an extremely imbalanced dataset. Therefore, balanced accuracy is the key metric used to compare model performance. The balanced accuracy for logistic regression, Lasso Regression, random forest, and XGBoost are 78.4%, 61.7%, 50%, and 67.7%, respectively. Logistic regression demonstrated the strongest performance, while also highlighting the significant role of age, particularly for individuals over 45.
Downloads
References
[1] Centers for Disease Control and Prevention. (n.d.). 2024-10-1. Stroke facts. Centers for Disease Control and Prevention. https://www.cdc.gov/stroke/data-research/facts-stats/index.html.
[2] Goldstein, L. B., Adams, R., Alberts, M. J., Appel, L. J., Brass, L. M., Bushnell, C. D., Culebras, A., DeGraba, T. J., Gorelick, P. B., Guyton, J. R., Hart, R. G., Howard, G., Kelly-Hayes, M., Nixon, J. V., Sacco, R. L., American Heart Association, & American Stroke Association Stroke Council. Primary prevention of ischemic stroke: a guideline from the American Heart Association/American Stroke Association Stroke Council: cosponsored by the Atherosclerotic Peripheral Vascular Disease Interdisciplinary Working Group; Cardiovascular Nursing Council; Clinical Cardiology Council; Nutrition, Physical Activity, and Metabolism Council; and the Quality of Care and Outcomes Research Interdisciplinary Working Group. Circulation, 2016, 113 (24), e873 – e923.
[3] Liu, Y., Yin, B., & Cong, Y. The probability of ischemic stroke prediction with a multi-neural-network model. Sensors (Basel, Switzerland), 2020, 20 (17), 4995.
[4] Ntaios, G., Faouzi, M., Ferrari, J., Lang, W., Vemmos, K., & Michel, P. An integer-based score to predict functional outcome in acute ischemic stroke: the ASTRAL score. Neurology, 2012, 78 (24), 1916 – 1922.
[5] Heo, J., Yoon, J. G., Park, H., Kim, Y. D., Nam, H. S., & Heo, J. H. Machine learning-based model for prediction of outcomes in acute stroke. Stroke, 2019, 50 (5), 1263 – 1265.
[6] Stroke Prediction Dataset. 2020. fedesoriano. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data.
[7] Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J. Y., & Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of clinical epidemiology, 2019, 110, 12 – 22.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







