Credit Default Analysis and Prediction Based on Machine Learning

Yuxiang Lai

doi:10.54097/hbem.v21i.14762

Authors

Yuxiang Lai

DOI:

https://doi.org/10.54097/hbem.v21i.14762

Keywords:

Credit Fraud; Random Forests; Artificial Neutral Networks.

Abstract

The Credit Default Prediction project aims to develop an efficient machine learning model to accurately predict loan default risk on the Lending Club platform. The project is based on historical loan data, including borrower personal information, financial metrics, and credit history records, with the goal of building a robust predictive model. The project's workflow encompasses several critical steps, including data preprocessing, exploratory data analysis, feature engineering, model selection, and performance evaluation. In terms of model selection, the project employs two primary machine learning algorithms, namely Artificial Neural Networks (ANN) and Random Forest. These algorithms are renowned for their outstanding performance in handling extensive borrower data and providing reliable risk predictions. Model training and evaluation are conducted using a substantial amount of historical data to ensure accurate predictions across various scenarios. Furthermore, the project conducts feature importance analysis to identify factors that significantly influence loan default risk. These insights contribute to enhancing Lending Club's risk assessment process and supporting more informed decisions regarding loan approval and pricing strategies. By combining data-driven predictive modeling with in-depth data analysis, this project aims to enhance the efficiency of Lending Club's loan operations and elevate its risk management capabilities, ultimately providing more dependable financial services to investors and borrowers.

Downloads

Download data is not yet available.

References

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188.

Durand, D. (1941). Risk elements in consumer installment financing. National Bureau of Economic Research, New York, 60-129.

Makowski, P. (1985). Credit scoring branches out. Credit World, 75, 30-37.

Wiginton, J. C. (1980). A note on the comparison of logit and discriminant models of consumer credit behavior. The Journal of Financial and Quantitative Analysis, 15(3), 757-770.

Westgaard, S., & Van der Wijst, N. (2001). Default probabilities in a corporate bank portfolio: A logistic model approach. European Journal of Operational Research, 135(2), 338-349.

Coats, P. K., & Fant, L. F. (1993). Recognizing financial distress patterns using a neural network tool. Financial Management, 22(3), 142-155.

Odom, M. D., & Sharda, R. (1990). A neural network model for bankruptcy prediction. IJCNN International Joint Conference on Neural Networks, 5, 163-168.

Chen, C. C., Singh, J. P., Poland, W. B., et al. (1994). Parallel protein structure determination from uncertain data. Supercomputing 94: Proceedings. IEEE, 570-579.

Van Gestel, I. T., Baesens, B., Garcia, I. J., et al. (2003). A support vector machine approach to credit scoring. Forum Financier Revue Bancaire Et Financiere Bank En Financiewezen, 6, 73-82.

Holland, J. L. (1985). The self-directed search. Psychological Assessment Resources, 5, 11-45.

https://www.kaggle.com/datasets/wordsforthewise/lending-club