Traffic Accident Severity Prediction Based on Data Cleaning and Machine Learning (Random Forest / Xgboost)
DOI:
https://doi.org/10.54097/h8cq6864Keywords:
US Accident; Random Forest; XGBoostAbstract
Traffic accidents have increasingly become a global concern, significantly affecting lives and economic sustainability. The US Accidents dataset from 2016 to 2023 provides an extensive record of accidents across the United States, containing detailed data and environmental data of these accident. This study aims to harness the potential of this rich database to predict severity level of accidents. Our research predominantly revolved around meticulous data cleaning, ensuring that the dataset's integrity was uncompromised. After preprocessing, the cleaned data was subjected to sophisticated Machine Learning techniques, primarily focusing on the Random Forest and XGBoost algorithms. These models were chosen due to their renowned capability in handling complex datasets and rendering accurate predictions, especially in scenarios laden with multiple variables. Upon application, the models demonstrated impressive efficacy. To validate the reliability and performance of our models, we employed the confusion matrix. This tool provided a clear visualization of the models' accuracy, revealing true positives, false negatives, and other crucial metrics. Furthermore, to enhance prediction outcomes, the Voting Classifier was implemented, combining the strengths of our primary models and consequently elevating the overall accuracy. The Random Forest algorithm exhibited substantial precision, while XGBoost further enhanced prediction accuracy. These findings underline the significant role of advanced data analytics and Machine Learning in comprehending traffic accident dynamics. In conclusion, our study emphasizes that leveraging state-of-the-art Machine Learning techniques on well-curated datasets can substantially improve our understanding and prediction of traffic accident severity. Such insights pave the way for the development of more effective preventive measures and safety protocols, aiming for a safer traffic environment in the future.
Downloads
References
WHO, V. (2018). Global status report on road safety 2018. World Health Organization.
Abdulla, R., Qader, B., & Sdiq, K. (2023). Traffic Accident Traits and Driver Characteristics Implication on Road Accidents using Descriptive Analysis: A Cross Sectional Study in Sulaymaniyah, Iraq. Engineering, Technology & Applied Science Research, 13(2), 10372-10376.
Rana, V., Joshi, H., Parmar, D., Jadhav, P., & Kanojiya, M. (2019). Road accident prediction using machine learning algorithm. International Research Journal of Engineering and Technology (IRJET), 6(03), 0.
Pourroostaei Ardakani, S., Liang, X., Mengistu, K. T., So, R. S., Wei, X., He, B., & Cheshmehzangi, A. (2023). Road Car Accident Prediction Using a Machine-Learning-Enabled Data Analysis. Sustainability, 15(7), 5939.
US Accidents (2016 - 2023). Available online: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents?datasetId=199387&sortBy=commentCount&language=Python&sort=votes (accessed on 1 August 2023)
Yan M, Shen Y. Traffic accident severity prediction based on random forest [J]. Sustainability, 2022, 14(3): 1729.
Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.
Jerome H. Friedman. "Greedy function approximation: A gradient boosting machine." The Annals of Statistics, 29(5) 1189-1232 October 2001.
dmlc.XGBoost, https://xgboost.readthedocs.io/en/latest/tutorials/model.html
Introduction to Boosted Trees-Xgboost v: satble Documentation. Available online: https://xgboost.readthedocs.io/en/latest/tutorials/model.html (accessed on 31 August 2023).
Chakraborty D, Elzarka H. Advanced machine learning techniques for building performance simulation: a comparative analysis[J]. Journal of Building Performance Simulation, 2019, 12(2): 193-207.
Classification report-scikit.learn. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html#classification-report (accessed on 31 August 2023)
Mohanta, B. K., Jena, D., Mohapatra, N., Ramasubbareddy, S., & Rawal, B. S. (2022). Machine learning based accident prediction in secure iot enable transportation system. Journal of Intelligent & Fuzzy Systems, 42(2), 713-725.
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16: 321-357.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







