Visual Insights in Lung Cancer Prediction: Leveraging Data Visualization and Analysis for Building and Tuning Machine Learning Models

Authors

  • Zeyu Chen
  • Ching-Syuan Chien
  • Heming Guan

DOI:

https://doi.org/10.54097/a9a6xf33

Keywords:

Lung cancer prediction, Exploratory Data Analysis, Data visualization, Machine learning models.

Abstract

Lung cancer stands as a significant global health challenge with its high occurrence rates and fatal lethality. Given lung cancer's severity and complicated symptoms in later stages, healthcare facilities must detect them early. Traditional diagnostic methodologies are often burdened by high cost and specialized expertise, prompting exploration of alternative methods, such as machine Learning. This paper utilizes a dataset from a Nature Medicine study involving 462,000 participants in China and employs machine learning to predict lung cancer risk. Through the application of Exploratory Data Analysis (EDA), this paper explored the correlation between demographic variables, environmental factors, and lifestyle habits with the probability of lung cancer occurrence. The neural network architecture incorporates dynamic layers, rectified linear unit (ReLU) activation, and the Adam optimizer with dropout regularization. Results from EDA reveal correlations between lifestyle factors and lung cancer risk. The machine learning model achieved 70% accuracy on predictions and was refined to 90% through EDA and utilizing the LIME interpreter. This study aimed to advance lung cancer prediction by applying an amalgamation of techniques and technologies, like data visualization, extensive data analysis, and machine learning. This research aimed to contribute to the showcase of applying the latest technological advancements for the era of big data within the medical research and healthcare industry. The success of this model suggests the viability of creating machine learning models targeted for cancer predictions and indicates further advancements, such as personalized prediction models.

Downloads

Download data is not yet available.

References

Siegel, Rebecca L., et al. Cancer statistics, 2021.Ca Cancer J Clin 71.1, 2021: 7 - 33.

Nooreldeen, R.; Bach, H. Current and Future Development in Lung Cancer Diagnosis. Int. J. Mol. Sci. 2021, 22, 8661.

Kaggle. Lung Cancer Prediction, 2022.https://www.kaggle.com/datasets/thedevastator/cancer-patients-and-air-pollution-a-new-link/data.

Ahmad AS, Mayya AM. A new tool to predict lung cancer based on risk factors. Heliyon. 2020; 6 (2): e03402.

Komorowski, M., Marshall, D.C., Salciccioli, J.D., Crutain, Y. Exploratory Data Analysis. In: Secondary Analysis of Electronic Health Records. Springer, Cham. 2016.

Patil P. What is Exploratory Data Analysis? - Towards Data Science. Medium. Published March 23, 2018. Accessed November 12, 2023. https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15.

Downloads

Published

10-04-2024

How to Cite

Chen, Z., Chien, C.-S., & Guan , H. (2024). Visual Insights in Lung Cancer Prediction: Leveraging Data Visualization and Analysis for Building and Tuning Machine Learning Models. Highlights in Science, Engineering and Technology, 92, 188-193. https://doi.org/10.54097/a9a6xf33