Prediction of Living Status for Breast Cancer Patients Using Machine Learning
DOI:
https://doi.org/10.54097/rnqpr864Keywords:
Breast cancer, Machine learning, Random Forest, Recursive feature elimination.Abstract
Due to reasons such as diet, living habits, and genes, etc., breast cancer has gradually become a common high mortality disease and is not limited to women. Many patients suffer from terrible illness or even death due to untimely treatment or incorrect prediction of tumors. This study has 500 observations in total through utilizing the combination of oversampling and undersampling. What’s more, by using advanced machine learning, the method of recursive feature elimination and random forest, this research constructs a model and select four types of protein content in the body and age as the most relevant features in predicting patients’ living status. The model has a relative high accuracy 98.052% and can provide physicians relative accurate information to give targeted treatment. It also can be utilized as a warning to remind patients of their current physical condition. Early treatment and accurate diagnosis can significantly improve patient survival rates.
Downloads
References
Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019, June). Machine learning classification techniques for breast cancer diagnosis. In IOP Conference Series: Materials Science and Engineering (Vol. 495, p. 012033). IOP Publishing.
Maughan, K. L., Lutterbie, M. A., & Ham, P. S. (2010). Treatment of breast cancer. American family physician, 81(11), 1339–1346.
Yue, W., Wang, Z., Chen, H., Payne, A., & Liu, X. (2018). Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2(2), 13.
Algehyne, E. A., Jibril, M. L., Algehainy, N. A., Alamri, O. A., & Alzahrani, A. K. (2022). Fuzzy neural network expert system with an improved Gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia. Big Data and Cognitive Computing, 6(1), 13.
Rajani, K. (2023, March 24). Breast cancer survival dataset. Kaggle. https://www.kaggle.com/datasets/kreeshrajani/breast-cancer-survival-dataset
Seger, C. (2018). An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing.
Singh, B. K., Verma, K., & Thoke, A. S. (2015). Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification. International Journal of Computer Applications, 116(19).
Sharma, N. V., & Yadav, N. S. (2021). An optimal intrusion detection system using recursive feature elimination and ensemble of classifiers. Microprocessors and Microsystems, 85, 104293.
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101.
Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25, 197-227.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







