Research on Prediction of Breast Cancer Type using Machine Learning

Authors

  • Dehui Kong

DOI:

https://doi.org/10.54097/hset.v54i.9808

Keywords:

Breast cancer, classification, machine learning.

Abstract

The most typical cancer type among women worldwide is breast cancer. In 2020 alone, it afflicts about 0.68 million people and 6.9% of all cancer cases. How to categorize tumors as benign (non-cancerous) or malignant (cancerous) is one of the main obstacles to its diagnosis. This study helps to make an accurate and reliable diagnosis based on the initial data of the tumor, such as smoothness, texture, area using machine learning models. This study uses five machine learning models, Logistic Regression (RF), Random Forest (RF), Support Vector Machine (SVM), K-nearest Neighbor (KNN), Naive Bayes Classifier (NBC) and three modelling systems, feature selection-ML and principal component analysis (PCA)-ML system to make predictions of the type of the tumor of Wisconsin Breast Cancer Dataset. Model performance are assessed by three performance evaluation which are accuracy, precision, recall. The results of full model show that random forest has the highest prediction accuracy of 98.25% out of the sample and 100% in the sample, and SVM's sigmoid-based kernel model has the lowest prediction accuracy of 83.33% outside and 85.27% inside the sample. The results of the feature selection model based on RF and LR shows that the RF with only 13 variables has the highest prediction accuracy 98.25% out-of-sample and 100% in-sample. Among all the PCA--ML models, PCA--NBC has the highest prediction accuracy of 97.33% out-of-sample. Nevertheless, PCA-RF has the highest prediction accuracy of 100% in-sample.

Downloads

Download data is not yet available.

References

Siegel R L, et al. Cancer statistics. CA Cancer J Clin, 2023, 73 (1): 17 - 48.

Mashudi N A, Rossli S A, Ahmad N, Mohd Noor N. Breast Cancer Classification: Features Investigation using Machine Learning Approaches. International Journal of Integrated Engineering, 2021.

Kalaf J M. Mammography: A history of success and scientific enthusiasm. Radiol Bras, 2014.

Bellaachia A, Guven E. Predicting breast cancer survivability using data mining techniques. Age, 2016, 58 (13), 10 - 110.

Saygılı A. Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. International Scientific and Vocational Studies Journal, 2018, 2 (2): 48 - 56.

Amrane M, Oukid S, Gagaoua I, Ensarİ T. Breast cancer classification using machine learning, in 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), 2018, 1 - 4.

Anuradha R. Support Vector Machine Classifier for Prediction of Breast Malignancy Using Wisconsin Breast Cancer Dataset. Journal of Artificial Intelligence, Machine Learning and Neural Network (JAIMLNN), 2022.

Oyewola D, Hakimi D, Adeboye K, Shehu M D. Using Five Machine Learning for Breast Cancer Biopsy Predictions Based on Mammographic Diagnosis, 2017.

Zhang L, et al. Raman spectroscopy and machine learning for the classification of breast cancers. Working paper, 2022.

Nursabillilah M, et al. Comparison of microarray breast cancer classification using support vector machine and logistic regression with LASSO and boruta feature selection. Indonesian Journal of Electrical Engineering and Computer Science, 2022, 20 (2): 712 – 719.

Ahmed A, et al. A Neutrosophic based C-Means Approach for Improving Breast Cancer Clustering Performance. Working paper, 2023.

Mashudi N A, Rossli S A, Ahmad N, Mohd N. Breast Cancer Classification: Features Investigation using Machine Learning Approaches. International Journal of Integrated Engineering, 2021.

Downloads

Published

04-07-2023

How to Cite

Kong, D. (2023). Research on Prediction of Breast Cancer Type using Machine Learning. Highlights in Science, Engineering and Technology, 54, 440-447. https://doi.org/10.54097/hset.v54i.9808