Diabetes Prediction Using Random Forest in Healthcare

Authors

  • Shengyu Wang

DOI:

https://doi.org/10.54097/5ndh9a05

Keywords:

Diabetes prediction, random forest, logistic regression.

Abstract

Accurate diabetes prediction has emerged as a crucial problem in the field of healthcare. It is important to detect individuals at risk of having diabetes, which can allow for prompt intervention and tailored treatment strategies. Nowadays, machine learning models are usually employed for diabetes prediction. Lots of work has been developed various machine learning models for diabetes prediction. The random forest, as a popular ensemble learning algorithm, has illustrated its superiority for diabetes prediction. To this end, this paper demonstrates a study that employs the random forest algorithm for diabetes prediction. A random forest model is trained on a publicly available diabetes dataset and compared to its performance with logistic regression. The missing data imputation techniques are further leveraged to improve data integrity. Regarding model performance, it can be found that the random forest model significantly outperforms the logistic regression model. This highlights the superiority of tree-based models, such as random forest, for predicting diabetes compared to logistic regression.

Downloads

Download data is not yet available.

References

Smith JW, Everhart JE, Dickson WC, et al. Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, pages 261 – 265, 1988.

Kahramanli H and Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert System with Applications, 35 (1-2): 82 – 89, 2008.

Hasan MK, Alam MA, Das D, et al. Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8: 76516 – 76531, 2020.

Kayaer K, Yildirim T, et al. medical diagnosis on pima indian diabetes using general regression neural networks. In Proceedings of the international conference on artificial neural networks and neural information processing, volume 181, page 184, 2003.

Lee CS and Wang MH. A fuzzy expert system for diabetes decision support application. IEEE Transactions on Systems, Man, and Cybernetics. Part B, 41 (1): 139 – 153, 2011.

Lin WC and Tsai CF. Missing value imputation: a review and analysis of the literature (2006-2017). Artificial Intelligence Review, 53 (2): 1487 – 1509, 2020.

Hosmer DW and Lemeshow S. Applied Logistic Regression, Second Edition. Wiley, 2000.

Breiman L. Random forests. Machine Learning, 45 (1): 5 – 32, 2001.

He H and Garcia EA. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21 (9): 1263 – 1284, 2009.

Downloads

Published

10-04-2024

How to Cite

Wang, S. (2024). Diabetes Prediction Using Random Forest in Healthcare. Highlights in Science, Engineering and Technology, 92, 210-217. https://doi.org/10.54097/5ndh9a05