Titanic Survival Prediction with Enhanced Random Forests

Authors

  • Shengfan Xu

DOI:

https://doi.org/10.54097/tfxr3t12

Keywords:

Titanic dataset, Random Forest, Feature Weighting, Class Imbalance, Bayesian Optimization.

Abstract

This paper proposes an enhanced random forest framework designed to address key challenges in the Titanic survival prediction task, including class imbalance, feature heterogeneity, and limited sample size. The method integrates an entropy-based adaptive feature weighting mechanism to amplify the influence of critical socio-demographic features—such as gender and passenger class—during decision tree splits, thereby improving split reliability and model interpretability. To mitigate bias arising from the underrepresentation of survivors (minority class), SMOTE is employed to synthetically balance the training data. Furthermore, Bayesian optimization is utilized for efficient and robust hyperparameter tuning, enhancing generalization performance. Extensive experiments on the Kaggle Titanic dataset demonstrate that the proposed approach consistently outperforms a range of baselines—including logistic regression, SVM, standard random forests, XGBoost, and MLP—in terms of accuracy, recall, F1-score, and AUC. Ablation studies confirm the complementary contributions of each component, while error analysis reveals systematic misclassifications in specific subgroups (e.g., male third-class passengers), offering insights into model behavior and limitations. The framework not only achieves superior predictive performance but also improves fairness and stability, presenting a principled and extensible solution for classification tasks on small, imbalanced, and heterogeneous tabular datasets.

Downloads

Download data is not yet available.

References

[1] Howells R. Atlantic crossings: nation, class and identity in Titanic (1953) and A Night to Remember (1958). Historical Journal of Film, Radio and Television, 1999, 19 (4): 421–438.

[2] Eaton J P, Haas C A. Titanic: Triumph and Tragedy. W. W. Norton & Company, 1994.

[3] Dua D, Graff C. UCI machine learning repository. [Online]. Available: http://archive.ics.uci.edu/ml, 2019.

[4] Hosmer D W, Lemeshow S, Sturdivant R X. Applied Logistic Regression. Wiley, 2013.

[5] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20 (3): 273–297.

[6] Breiman L. Random forests. Machine Learning, 2001, 45 (1): 5–32.

[7] Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016: 785–794.

[8] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y. LightGBM: a highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems (NeurIPS), 2017: 3149–3157.

[9] Al-Hayik U H S, Abu-Naser S S. Chances of survival in the Titanic using ANN, 2023.

[10] Lundberg S M, Lee S I. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 2017: 4765–4774.

[11] Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321–357.

[12] Kaggle. Titanic – machine learning from disaster. [Online]. Available: https://www.kaggle.com/c/titanic. Published: 2012-09-12, Accessed: 2025-09-22.

Downloads

Published

29-01-2026

Issue

Section

Articles

How to Cite

Xu, S. (2026). Titanic Survival Prediction with Enhanced Random Forests. Academic Journal of Science and Technology, 19(2), 439-444. https://doi.org/10.54097/tfxr3t12