Analysis and Comparison of Fresh Sales Forecasting Models Based on FreshRetailNet-50K Dataset
DOI:
https://doi.org/10.54097/hedzye78Keywords:
Sales Forecasting, XGBoost, LSTM, Prophet, SHAP Explainability.Abstract
In the context of the increasingly intertwined digital economy and traditional retail businesses, the role of data-driven decision-making has become a vital strategy to improve the efficiency of the fresh produce chain. In this paper, the authors systematically analyse the differences between the following three popular sales prediction models: Prophet, Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory Model (LSTM), based on the openly available FreshRetailNet-50K dataset. The dataset covers the multidimensional feature set information, as well as the external influencing factors such as weather information, promotional activities, and holiday information. From the experiment results, the authors find that the prediction accuracy of the XGBoost model ranks the highest among the three sales prediction methods, with the lowest Root Mean Square Error (RMSE) at 0.714 and the highest R-Square (R²) at 0.816 compared to the Prophet and LSTM counterparts. Moreover, the authors apply the SHapley Additive exPlanations (SHAP) method to explain the interpretability of the above results and find that the time series-based features like “Lag Sale” and “Rolling Average Sale” play the most important role in determining the prediction results and demonstrate the "inertia effect" of the sales in the very short term.
Downloads
References
[1] Accorsi, R., & Manzini, R. (2021). A review of data-driven approaches for fresh food supply chain management. Trends in Food Science & Technology, 108, 1–15.
[2] Bentéjac, C., Csörgö, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967.
[3] Chen, Y., Li, F., & Wang, J. (2021). A comparative analysis of machine learning models for e-commerce sales forecasting with promotion information. Journal of Retailing and Consumer Services, 62, 102643.
[4] Huang, X., Zhang, Q., & Li, J. (2023). Weather and promotion effects on retail sales: Evidence from multivariate forecasting. Decision Support Systems, 164, 113826.
[5] Kumar, A., & Singh, R. (2023). Retail demand forecasting using Prophet and ARIMA models: A comparative study. In Proceedings of the 4th International Conference on Advances in Computing and Data Sciences (ICACDS).
[6] Ma, X., Liu, Z., & Sun, Y. (2022). Interpretable freight volume forecasting with spatio-temporal graph neural networks and SHAP. IEEE Transactions on Intelligent Transportation Systems, 23, 19579–19589.
[7] Rigney, D. (2010). The Matthew Effect: How advantage begets further advantage. Columbia University Press.
[8] Siami-Namini, S., & Namin, A. S. (2021). A comparative analysis of forecasting financial time series using ARIMA, LSTM, and GRU. Journal of Risk and Financial Management, 14(12), 615.
[9] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45.
[10] Wang, J., & Zhang, H. (2021). Integration of digital economy and physical retail: Opportunities and challenges for data-driven decision making in fresh produce supply chains. Journal of Business Research, 130, 456–466.
[11] Zhang, L., Wang, Y., & Gao, S. (2021). A comparative study of LSTM, GRU, and transformer models for stock price prediction. Journal of Risk and Financial Management, 14, 365.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

