Survivals of Titanic Prediction Utilizing Tree-based Machine Learning Models

Authors

  • Tianyi Zhao

DOI:

https://doi.org/10.54097/fwnnrc23

Keywords:

Supervised learning, feature importance, Titanic.

Abstract

The shipwreck of Titanic is a well-known tragedy. Although it happened more than a century ago, researchers are still investigating the patterns of the survivors to gain more insight into human behaviors in catastrophes. This paper adopts machine learning techniques, including decision tree, random forest, and gradient boosting, to conduct a binary classification to predict whether a person survived. The selected models are all tree-based, making it convenient to examine the importance of features. In the preprocessing stage, all numerical features are discretized. This paper first investigates the performances of the models. Subsequently, the model with the best performance generates and studies the importance of the feature. The result demonstrates that the decision tree classifier with a max depth equal to seven achieves the highest accuracy of 0.78. The results of the three models are similar, indicating that the research is robust. The feature importance generated by the decision tree classifier shows that sex and social status significantly impact the survival result. In addition, whether the person is a child also makes a difference. The discretized features do not have enough influence on the result of survival. This paper concludes that the tunned decision tree classifier is the best model to study the features in this paper, but the created features are not effective enough.

Downloads

Download data is not yet available.

References

Kaggle. Titanic - Machine Learning from Disaster. Kaggle. https://www.kaggle.com/c/titanic/overview, 2024.

Ekinci E, Omurca S İ, Acun N A. Comparative study on machine learning techniques using Titanic dataset. 7th international conference on advanced technologies, 2018, 411-416.

Singh A, Saraswat S, Faujdar N. Analyzing Titanic disaster using machine learning algorithms. 2017 International Conference on Computing, Communication and Automation (ICCCA 2017), 2017, 406-411. Doi: 10.1109/CCAA.2017.8229835.

Singh K, Nagpal R, Sehgal R. Exploratory Data Analysis and Machine Learning on Titanic Disaster Dataset, 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2020, 320-326. Doi: 10.1109/Confluence47617.2020.9057955.

Barhoom A M, Khalil A J, Abu-Nasser B S, Musleh M M, Naser S S A. Predicting Titanic Survivors using Artificial Neural Network. International Journal of Academic Engineering Research, 2019, 3 (9): 8-12.

Ai Y. Predicting Titanic survivors by using machine learning. Highlights in Science Engineering and Technology, 2023, 34, 360–367.

Cao Y, Xie W, Dong C, Qiu J. Titanic Machine Learning Study from Disaster. Applied Economics & Statistics Research Report, University of Delaware, 2020, RR20-01.

Qiu Y, Wang J, Jin Z, Chen H, Zhang M, Guo L. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control. 2022 1; 72: 103323.

Woźniak M, Wieczorek M, Siłka J. BiLSTM deep neural network model for imbalanced medical data of IoT systems. Future Generation Computer Systems. 2023 Apr 1; 141: 489-99.

Ding Y, Zhang Z, Zhao X, Hong D, Cai W, Yang N, Wang B. Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification. Expert Systems with Applications. 2023 Aug 1; 223: 119858.

Downloads

Published

01-09-2024

How to Cite

Zhao, T. (2024). Survivals of Titanic Prediction Utilizing Tree-based Machine Learning Models. Highlights in Business, Economics and Management, 40, 284-288. https://doi.org/10.54097/fwnnrc23