Prediction and Visualization Analysis of Lung Cancer Risk by Machine Learning

Authors

  • Ziqi Wan

DOI:

https://doi.org/10.54097/hset.v39i.6531

Keywords:

Machine Learning; Lung Cancer; Visualization; Cancer Prediction; Random Forest.

Abstract

One of the most lethal cancer types in the world is lung cancer. It gravely threats human health and causes more than one million deaths annually. Current research demonstrate that an early and accurate determination of lung cancer is the key element in reducing the mortality rate, which motivates us to predict cancer at an early stage by exploiting the capacity of machine learning. In this work, both qualitative visualization analysis and quantitative experiments are conducted to reveal the relevant factors and the effectiveness of lung cancer risk prediction. Visual analyses are implemented to display the distribution and correlation between characteristics of patients and lung cancer risk. The visualizations demonstrate that smoking has a relatively large impact on lung cancer. Moreover, various types of machine learning algorithms are implemented and compared, including the logistic regression, K-nearest neighbors, random forest, and decision tree. These methods achieve satisfactory prediction results, with an accuracy of about 84% to 90%.

Downloads

Download data is not yet available.

References

Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2021, 71(3): 209-249.

Zheng R, Zhang S, Zeng H, et al. Cancer incidence and mortality in China. J Natl Cancer Cent, 2022, 2 (1): 1-9.

He J, Wei WQ. 2019 China cancer registry annual report. Beijing: People’s Health Publishing House, 2021: 59-98.

Zeng H, Chen W, Zheng R, et al. Changing cancer survival in China during 2003-15: a pooled analysis of 17 population-based cancer registries. Lancet Global Health, 2018, 6(5): e555-e567.

Zeng H, Ran X, An L, et al. Disparities in stage at diagnosis for five common cancers in China: a multicentre, hospital-based, observational study. Lancet Public Health, 2021, 6(12): e877-e887.

Cressman S, Peacock SJ, Tammemgi MC, et al. The cost effectiveness of high-risk lung cancer screening and drivers of program efficiency. J Thorac Oncol, 2017, 12 (8): 1210-1222.

Siegel RL, Miller KD, Jemal A. Cancer statistics. CA Cancer J Clin, 2017, 67(1): 7-30.

Simms KT, Hanley SJB, Smith MA, et al. Impact of HPV vaccine hesitancy on cervical cancer in Japan: a modelling study. Lancet Public Health, 2020, 5(4): e223-e234.

Li N, Li J, Chen WQ, et al. Research progress of quality assessment of cancer screening guidelines and consensus. Chinese Journal of Epidemiology, 2021, 42(2): 211-214.

Yuan Wenqian, Prediction of gastric cancer survival based on SEER database. Shanghai Normal University, 2021.

Downloads

Published

01-04-2023

How to Cite

Wan, Z. (2023). Prediction and Visualization Analysis of Lung Cancer Risk by Machine Learning. Highlights in Science, Engineering and Technology, 39, 221-229. https://doi.org/10.54097/hset.v39i.6531