Regression Forecast of Hong Kong Passenger Flow Based on Baidu Index
DOI:
https://doi.org/10.54097/w5xj3381Keywords:
Automated Web Information Crawling, Latent Dirichlet Allocation, ARIMA, Elastic Net Regression, Keyword Extraction.Abstract
Accurate forecasting of tourist arrivals is crucial for rational planning and resource allocation in the urban tourism industry. In recent years, scholars have found that web search data is correlated with tourism demand, providing new opportunities for such forecasting. This study aims to improve the accuracy of forecasting Hong Kong tourist arrivals by utilizing web search data and advanced machine learning methods. First, this paper employs large-scale web crawling techniques to collect text data related to Hong Kong tourism and applies the Latent Dirichlet Allocation (LDA) model to extract 15 key keywords. Subsequently, based on the search popularity data of these keywords on the Baidu Index, the study employs the ARIMA model and the Elastic Net Regression algorithm respectively for tourist arrival forecasting and compares the performance of the two models. The results show that compared to the traditional ARIMA model, the Elastic Net Regression performs better on multiple key indicators: 1) The Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) of forecasting total visitor arrivals and overnight visitors are significantly smaller for the Elastic Net Regression; 2) The coefficient of determination (R²) of the Elastic Net Regression is also substantially higher than the ARIMA model, indicating stronger data fitting capability; 3) The Elastic Net Regression maintains a small error gap between the training and test sets, suggesting good generalization performance without obvious overfitting. Overall, the method of using web search data to assist tourist arrival forecasting outperforms traditional time series analysis and can improve forecasting accuracy.
Downloads
References
[1] W. Kunpeng, P. Xiao, B. Yekun, and Z. Wenyan, Modeling and Forecasting Tourism demand in Lhasa based on Baidu Index, Journal of Xizang University. 2017.
[2] R. Huan, L. Ting, K. Junfeng, P. Ning, L. Minjing, and A. Shunyi, A Method for predicting the daily tourist size of Cities based on Baidu Index," Journal of Hangzhou Normal University. 2017.
[3] X. Huang, L. Zhang, and Y. Ding, The Baidu Index: Uses in predicting tourism flows eA case study of the Forbidden City, Tourism Management, 2017, 58: 301 - 306. DOI: https://doi.org/10.1016/j.tourman.2016.03.015
[4] L. Yao, R. Ma, and H. Wang, Baidu index-based forecast of daily tourist arrivals through rescaled range analysis, support vector regression, and autoregressive integrated moving average, Alexandria Engineering Journal, 2021, 60 (01): 365 - 372. DOI: https://doi.org/10.1016/j.aej.2020.08.037
[5] S. Tirunillai and G. J. Tellis, Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation, Journal of Marketing Research, 2014, 51 (04): 463 - 479. DOI: https://doi.org/10.1509/jmr.12.0106
[6] H. Edison and H. Carcel, Text data analysis using Latent Dirichlet Allocation: an application to FOMC transcripts, Applied Economics Letters, 2021, 28 (01): 38 - 42. DOI: https://doi.org/10.1080/13504851.2020.1730748
[7] D. Benvenuto, M. Giovanetti, L. Vassallo, S. Angeletti, and M. Ciccozzi, Application of the ARIMA model on the COVID-2019 epidemic dataset, Data in Brief, 2020, 29: 105340. DOI: https://doi.org/10.1016/j.dib.2020.105340
[8] F. Chang, H. Huang, A. H. Chan, S. Shing Man, Y. Gong, and H. Zhou, Capturing long-memory properties in road fatality rate series by an autoregressive fractionally integrated moving average model with generalized autoregressive conditional heteroscedasticity: A case study of Florida, the United States, 1975 – 2018, Journal of Safety Research, 2022, 81: 216 - 224. DOI: https://doi.org/10.1016/j.jsr.2022.02.013
[9] H. Han and K. J. Dawson, applying elastic-net regression to identify the best models predicting changes in civic purpose during the emerging adulthood, Journal of Adolescence, 2021, 93 (01): 20 - 27. DOI: https://doi.org/10.1016/j.adolescence.2021.09.011
[10] C. Hans, Elastic Net Regression Modeling with the Orthant Normal Prior, Journal of the American Statistical Association, 2011, 106 (496): 1383 - 1393. DOI: https://doi.org/10.1198/jasa.2011.tm09241
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






