Using Machine Learning Algorithms to Predict Parkinson's Disease : A Comparative Analysis Based on Logistic Regression, Random Forest and XGBoost

Authors

  • Jinqiu Zhang

DOI:

https://doi.org/10.54097/ggc4dg65

Keywords:

Parkinson, machine learning, logistic regression, random forest, XGBoost, data preprocessing, model evaluation.

Abstract

Parkinson's disease, as a neurodegenerative disease, seriously affects the quality of life of patients. Early and accurate diagnosis is crucial for treatment. For comparison and prediction, this study makes use of the Extreme Gradient Boosting (XGBoost) method, random forest, and logistic regression. Data preprocessing includes removing irrelevant features, deleting duplicate records, processing missing values, and Z-score standardization. The results show that the XGBoost model performs best on the test set, with an recall rate, accuracy rate, precision rate and F1 score of 0.87, which is greater than logistic regression (0.774) and random forest (0.862). The AUC value of XGBoost is 0.95, and the classification ability is the best. Despite overfitting, XGBoost performs well in dealing with complex nonlinear relationships. Future research can optimize hyperparameters, expand data sets, and explore deep learning techniques to improve prediction accuracy and generalization ability further. This research offers evidence in favor of machine learning-based Parkinson's disease prediction, which is crucial for raising the bar for early detection.

Downloads

Download data is not yet available.

References

[1] Y. Ben-Shlomo, S. Darweesh, J. Llibre-Guerra, C. Marras, M. San Luciano, & C. Tanner. The epidemiology of Parkinson’s disease. The Lancet, 403(10423), 2024, 283-292.

[2] A. Berardelli, G. K. Wenning, A. Antonini, D. Berg, B. R. Bloem, V. Bonifati, & W. Poewe. EFNS/MDS-ES recommendations for the diagnosis of Parkinson’s disease. European Journal of Neurology, 20(1), 2013,16-34.

[3] C. W. Olanow, M. B. Stern, & K. Sethi. The scientific and clinical basis for the treatment of Parkinson disease 2009. Neurology, 72(21 Supplement 4), S1-S136.

[4] C. Song, W. Zhao, H. Jiang, & Y. Tang. Stability evaluation of brain changes in Parkinson’s disease based on machine learning. Frontiers in Computational Neuroscience 2021.

[5] I. Q. Khan. Simultaneous prediction of symptom severity and cause in data from a test battery for Parkinson patients, using machine learning methods. Technology 2009.

[6] R. Elkharoua. (2024). Parkinson’s Disease Dataset Analysis. Kaggle. Retrieved from https://www.kaggle.com/datasets/rabieelkharoua/parkinsons-disease-dataset-analysis

[7] D. W. Hosmer, S. Lemeshow, & R. X. Sturdivant. Applied Logistic Regression (3rd ed.). John Wiley & Sons 2013.

[8] L. Breiman. Random forests. Machine Learning, 45(1), 2001, 5-32.

[9] A. Liaw, & M. Wiener. Classification and regression by randomForest. R News, 2(3), 2002,18-22.

[10] D. R. Cutler, T. C. Edwards, K. H. Beard, A. Cutler, K. T. Hess, J. Gibson, & J. J. Lawler. Random forests for classification in ecology. Ecology, 88(11),2007, 2783-2792.

[11] T. Chen, & C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,2016, 785-794.

[12] J. Zhang, Y. Zhao, & X. Chen. Improved XGBoost algorithm based on weighted feature selection and parameter optimization. IEEE Access, 8, 2020, 132128-132139.

[13] S. M. Lundberg, & S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 2017,30

Downloads

Published

29-01-2026

Issue

Section

Articles

How to Cite

Zhang, J. (2026). Using Machine Learning Algorithms to Predict Parkinson’s Disease : A Comparative Analysis Based on Logistic Regression, Random Forest and XGBoost. Academic Journal of Science and Technology, 19(2), 10-18. https://doi.org/10.54097/ggc4dg65