Construction and Comparison of Coronary Heart Disease Risk Prediction Models Based on Lasso-Logistic Regression and Random Forest Models
DOI:
https://doi.org/10.54097/yvgtrg09Keywords:
Coronary heart disease, Lasso regression, Logistic regression, Random Forest, Risk prediction model.Abstract
Coronary heart disease (CHD) is a prominent global cause of death, including genetic and environmental factors. High CHD mortality rates are observed in South Africa and China, and its prevalence is increasing among younger demographics due to lifestyle changes. Topic This study investigates the relationship between CHD and its risk factors using cardiovascular data from Framingham, Massachusetts, to develop a predictive model. Methods The study processed data using R 4.4.1 software, dividing it into training (70%) and testing (30%) sets. Feature variables for the model were selected through univariate analysis and Lasso regression. Prediction models were constructed using multifactorial Logistic regression and random forest machine learning algorithms. Model performance was assessed using ROC curves and confusion matrices. Results Lasso regression and univariate Logistic regression analysis, as well as stepwise Logistic regression analysis, indicated that gender, age, daily smoking, stroke history, hypertension, cholesterol, systolic pressure, and glucose are significant CHD predictors. The multifactorial Logistic regression model achieved an 85.6% classification accuracy, 85.8% precision, 99.5% recall, 0.921 F1 Score, and 0.717 AUC. The random forest model showed slightly lower performance with 85.2% accuracy, 34.8% precision, 99.2% recall, 0.919 F1 Score, and 0.615 AUC. Conclusion This study not only screened the influencing factors of coronary heart disease but also found that the CHD risk prediction model constructed by machine learning algorithm has good predictive performance, especially the Logistic regression model performed better on the data set of this study.
Downloads
References
[1] World Health Organization. World Health Organization official website. (2024-05-19) [2024-05-19]. https://www.who.int/.
[2] Fari S A, Karamzad N S, Singh K, et al. Burden of ischemic heart disease and its attributable risk factors in 204 countries and territories, 1990-2019. Eur J Prev Cardiol, 2022, 29 (2): 420 - 431.
[3] Shi Y, Wang Q, Zheng Z, et al. Analysis of risk factors for occupational exposure to blood-borne pathogens among medical staff: A comparison of effects based on random forest algorithm and logistic regression model. Occupational Health and Emergency Rescue, 2024, 42 (4): 440 - 445.
[4] Tunstall-Pedoe H, Detels R, Holland W, et al. Oxford textbook of public health. Oxford Medical, Oxford, 1996.
[5] Chen M J, Shi C Y. The impact of nursing intervention based on self-regulation theory on self-health management of elderly patients with diabetes, hypertension, and coronary heart disease. Contemporary Nurse (First Decade Edition), 2022, 29 (7): 57 - 61.
[6] Chen L. Understanding the health "killer" of middle-aged people in 3 minutes - coronary heart disease. Family Life Guide, 2024, 40 (8): 87 - 88.
[7] Sun J. Coronary heart disease is getting younger, these behaviors are the culprits. Family Life Guide, 2024, 40 (8): 107 - 108.
[8] Zhou Y L. The main risk factors for coronary heart disease and prevention. Medical Health Care Newspaper, 2024, (13).
[9] The Revision Committee of Chinese Hypertension Prevention and Treatment Guidelines, Hypertension League (China), China International Exchange and Promotion Association for Hypertension, et al. Chinese hypertension prevention and treatment guidelines (2024 Revised Edition). Chinese Journal of Hypertension (Chinese and English), 2024, 32 (7): 603 - 700.
[10] Yusuf S, Hawken S, Ounpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet, 2004, 364 (9438): 937 - 952.
[11] Kaggle. Heart disease prediction using logistic regression. (2024-05-19) [2024-05-19]. https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression.
[12] Zhang T, Xu J, Huang G. The Classic Paradigm of Medical Research: A 70-Year Review of the Framingham Heart Study. Chinese Journal of Cardiology, 2020, 48 (09): 805 - 810.
[13] Damen JA, Pajouheshnia R, Heus P, et al. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Medicine, 2019.
[14] Henderson A. Coronary heart disease: Overview. Lancet, 1996, 348, S1 - S4.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







