Pima Indian Diabetes Database and Machine Learning Models for Diabetes Prediction

Authors

  • Linshan Xie

DOI:

https://doi.org/10.54097/z7hz7j81

Keywords:

Diabetes; random forest (RF); logistic regression (LR); support vector machine (SVM).

Abstract

As a matter of fact, diabetes mellitus is becoming a growing epidemic worldwide in recent years which attracts great attention of researchers. In reality, as a result of the illness and its complications, the strain on healthcare systems is rising. Therefore, it is crucial to identify diabetes early in order to shield individuals from major consequences. With this in mind, in this study, four ML models are used for the imputation of dataset (K-NN) and the prediction of diabetes (LR, SVM and RF). According to the analysis, the results show that the LR model is slightly better than the RF as well as SVM models, with a prediction accuracy of 0.7913 and a precision of 0.8571. Ultimately, it can be said that there is a lot of promise when employing ML models to diagnose diabetes early on based on the evaluations. Overall, these results shed light on guiding further exploration of diabetes prediction.

Downloads

Download data is not yet available.

References

Li Z, Han D, Qi T, Deng J, Li L, Gao C, Gao W, Chen H, Zhang L, Chen W. Hemoglobin A1c in type 2 diabetes mellitus patients with preserved ejection fraction is an independent predictor of left ventricular myocardial deformation and tissue abnormalities. BMC Cardiovasc Disord, 2023, 23(1): 49.

Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract, 2022, 183: 109-119.

Maahs D M, West N A, Lawrence J M, Mayer-Davis E J. Epidemiology of type 1 diabetes. Endocrinol Metab Clin North Am, 2010, 39(3): 481-97.

Gillespie K M, Bain S C, Barnett A H, Bingley P J, Christie M R, Gill G V, Gale E A. The rising incidence of childhood type 1 diabetes and reduced contribution of high-risk HLA haplotypes. Lancet, 2004, 364(9446): 1699-700.

Vehik K, Hamman R F, Lezotte D, et al. Trends in high-risk HLA susceptibility genes among Colorado youth with type 1 diabetes. Diabetes care, 2008, 31(7): 1392-1396.

Chatterjee S, Khunti K, Davies MJ. Type 2 diabetes. Lancet, 2017, 389(10085): 2239-2251.

Langenberg C, Lotta LA. Genomic insights into the causes of type 2 diabetes. Lancet, 2018, 391(10138): 2463-2474.

Wu B, Niu Z, Hu F. Study on Risk Factors of Peripheral Neuropathy in Type 2 Diabetes Mellitus and Establishment of Prediction Model. Diabetes Metab J, 2021, 45(4): 526-538.

Laakso M. Biomarkers for type 2 diabetes. Mol Metab, 2019 Sep, 27S(Suppl): S139-S146.

Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl, 2008, 35(1): 82-89.

Maniruzzaman M, Rahman M J, Al-MehediHasan M, Suri H S, Abedin M M, El-Baz A, Suri J S. Accurate diabetes risk stratifcation using machine learning: role of missing value and outliers. J Med Syst, 2018, 42(5): 92.

Phillips P J. Oral glucose tolerance testing. Aust Fam Physician, 2012, 41(6): 391-3.

Greenspoon J S. Oral glucose tolerance test. Mayo Clin Proc, 1988, 63(8): 838.

Uloma I U, Christopher C K. Age, gender, and racial/ethnic differences in the association of triclocarban with adulthood obesity using NHANES 2013–2016. Arch Environ Occup Health, 2022, 77(1): 68-75.

Fathabadi A, Seyedian S M, Malekian A. Comparison of Bayesian, k-Nearest Neighbor and Gaussian process regression methods for quantifying uncertainty of suspended sediment concentration prediction. Sci Total Environ, 2022, 818: 151760.

Hu Y H, Lin W C, Tsai C F, Ke S W, Chen C W. An efficient data preprocessing approach for large scale medical data mining. Technol Health Care, 2015, 23(2): 153-60.

Stoltzfus J C. Logistic regression: a brief primer. Academic Emergency Medical, 2011, 18(10): 1099-104.

Tanveer M, Rajani T, Rastogi R, et al. Comprehensive review on twin support vector machines. Ann Oper Res, 2022, 3: 1–46.

Huang S, Cai N, Pacheco P P, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics, 2018, 5(1): 41-51.

Iparraguirre-Villanueva O, Espinola-Linares K, Flores Castañeda R O, Cabanillas-Carbonell M. Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics (Basel), 2023, 13(14): 2383.

Breiman L. Bagging predictors. Mach Learn, 1996, 24: 123–140.

Sarica A, Cerasa A, Quattrone A. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review. Front Aging Neurosci, 2017, 9: 329.

Downloads

Published

29-03-2024

How to Cite

Xie, L. (2024). Pima Indian Diabetes Database and Machine Learning Models for Diabetes Prediction. Highlights in Science, Engineering and Technology, 88, 97-103. https://doi.org/10.54097/z7hz7j81