Comparative Analysis of Predicting Diabetes in Senior Age Group Based on Machine Learning Models

Authors

  • Mengyuan Zhang

DOI:

https://doi.org/10.54097/22tepd66

Keywords:

Logistic regression, Random Forest, KNN.

Abstract

The study investigated the contributing factors to risk for diabetes among older populations and their impact on the onset of diabetes. Analyzing a dataset from Kaggle comprising 18, 100 older adults, the study revealed significant associations between diabetes and age, Body Mass Index (BMI), HbA1c levels, blood sugar levels, high blood pressure, heart disease, and smoking history. Comparison of logistic regression, random forest, and K-nearest neighbor (KNN) prediction models demonstrated that the random forest model exhibited superior performance, with a ROC-AUC value of 0.88, an out-of-pocket error rate of 7.58%, a sensitivity of 99%, and a specificity of 92.3%. This outperformed both the logistic regression (ROC-AUC value 0.85) and KNN (ROC-AUC value 0.80) models. The study indicated that the random forest model is advantageous for processing nonlinear data and multi-variable interactions, making it suitable for predicting diabetes risk in the elderly. Future research could incorporate lifestyle factors to enhance prediction accuracy.

Downloads

Download data is not yet available.

References

[1] Quan Z, Kaiyang Q, Yamei L, et al. Predicting diabetes mellitus with machine learning techniques, Frontiers in Genetics, 2018, 9, ISSN 1664 - 8021.

[2] Khan F A, Zeb K, Al-Rakhami M, et al. Detection and prediction of diabetes using data mining: a comprehensive review. IEEE Access, 2021, 9, 43711 - 43735.

[3] Travis F, Dean T E, Scot H S, et al. Limited effectiveness of diabetes risk assessment tools in seniors’ facility residents. Value in Health, 2017, 20 (3), 2017, 329 - 335.

[4] Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for random forest. WIREs Data Mining Knowl Discov. 2019, 9: e1301.

[5] Huiyang Z. Application of Cross-Validation in Model Comparison. Advances in Applied Mathematics. Halder R K, Uddin M N, Uddin, 2023. 12, 1866 – 1873.

[6] M A, et al. Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications. J Big Data, 2024, 11, 113.

[7] Talebi Moghaddam M, Jahani Y, Arefzadeh Z, et al. Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm. BMC Med Res Methodol, 2024, 24, 220.

[8] Fregoso-Aparicio L, Noguez J, Montesinos L, et al. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr. 2021, 13 (1): 148.

Published

24-12-2024

How to Cite

Zhang, M. (2024). Comparative Analysis of Predicting Diabetes in Senior Age Group Based on Machine Learning Models. Highlights in Science, Engineering and Technology, 123, 611-617. https://doi.org/10.54097/22tepd66