Multi-factor Stock Forecasting Model Based on Independence Criterion Weighted Bagging Algorithm

Authors

  • Xinyi Sun
  • Wendi Cai
  • Mengyu Li

DOI:

https://doi.org/10.54097/hbem.v12i.8358

Keywords:

Weighted Bagging Algorithm, Feature Selection, Mutual Information, Stock Price Forecast s.

Abstract

Stock price prediction and the selection of important factors are a hot issue for statisticians. Machine learning methods need to ensure the prediction accuracy of the model in high-dimensional data scenes on the one hand, and the interpretability of the output of the prediction model on the other. If a single model is used for forecasting, it is easy to cause the problems of deviation and variance imbalance of the forecasting model (over-fitting and under-fitting). Based on this, we propose an interpretable forecasting model weighted bagging based on independence criterion. On the one hand, the proposed method solves the problem that a single model is easy to fall into over-fitting and under-fitting by using the idea of model averaging. On the other hand, when the sub-model is an interpretable machine learning method in Bagging framework, the proposed method can also output the weighted feature importance of each factor. In addition, the choice of independence criterion and sub-model is free. The actual data analysis shows that the proposed method has smaller prediction error compared with the comparison methods such as LASSO, ridge regression, random forest and XGBoost. The factor selection model under the minimum prediction error criterion will be more interpretable.

Downloads

Download data is not yet available.

References

He, P., Lan, W., Ding, Yue. Is China's stock market predictable? -- A perspective based on a combined LASSO-logistic approach [J]. Statistical Research, 2021, 38 (05): 82 - 96. DOI: 10.19343/j.cnki.11-1302/c.2021.05.007.

Zhang Yiran. Linear regression analysis of stock prices--based on matlab ridge regression analysis [J]. Time Finance, 2013 (03): 198.

Chen, Quan, Gong, Xuantao. Research on multi-factor optimization model strategy for stocks based on decision tree [J]. Information Technology and Informatization, 2020, No.238 (01): 209 - 211.

Wen GJ, Yuan LX, Ma XG, et al. A genetic algorithm-based method for CART tree stock selection [J]. Investment and Entrepreneurship, 2022, 33 (15): 31 - 33+50.

Ma, T., Jiang, F. W., Tang, G. H. Deep learning and factor investment in Chinese stock market-based on generative adversarial network approach [J]. Economics (Quarterly), 2022, 22 (03): 819 - 842. DOI: 10.13821/j.cnki.ceq.2022.03.05.

Zhou, Liang. A study on multi-factor investment in stocks based on random forest model. Financial Theory and Practice, 2021, 7: 97 - 103 [2023-02-27].

Meng, S. Y. Research on Bagging regression algorithm based on attribute weights. Modern Electronics Technology, 2017, 40 (1) [2023-02-27]. DOI: 10.16652/j.issn.1004⁃373x.2017.01. 027.http://kns.cnki.net/kcms/detail/11.2127.TP.20191101.1122.002.html.

Tibshirani R. Regression Shrinkage and Selection via the Lasso [J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267 - 288.

Yao, H. L., Ma, X. Q., Wang H., Li, J. Z. Stock Market Trend Prediction Algorithm Based on Morphological Characteristics and Causal Ridge Regression, Computer Engineering, 2016, 42 (2): 175 - 183,

Random Forest Model Stock Price Prediction Based on Pearson Feature Selection, Yan Z. X., Qin C., Song G. Computer Engineering and Applications, 2021, 57 (15): 286 - 296,

Saunders C., Stitson M. O., Weston J. Support Vector Machine Reference Matual, Technical Report, Departm, ent of Computer Science, Royal Holloway, CSD-TR-98-03, 1998 [2023-02-27].

Li Zhiqin, Du Jianqiang, Nie Bin, Xiong Wangping, Huang Changyi, Li Huan. A review of feature selection methods. Computer Engineering and Applications, 2019, 55 (24) [2023-02-27].

Yong Ju-Ya, Zhou Zhong-Mei. Multilevel feature selection algorithm based on mutual information. Computer Applications, 2020, 40 (12): 3478 - 3484 [2023-02-27].

Thomas M. C., Joy A. T., Ruan Jishou, Zhang H. Translation. Fundamentals of Information Theory (original book, second edition). Machine Industry Press, 2007, ISBN 978 - 7 - 111 - 22040 - 4.

Tao-Yun Cao. A Study of Variable Importance Based on Random Forest. Statistics and Decision Making. 2022, 4: 60 - 63 [2023-02-27]. DOI: 10.13546/j.cnki.tjyjc.2022.04.011.

https://kns.cnki.net/kcms/detail/42.1009.C.20220223.1117.011.html.

Janitza S., Strobl C., Boulesteix A., An AUC-based Permutation Variable Importance Measure for Random Forests, BMC Bioinformatics 2013, 14-119 [2023-02-27], http://www.biomedcentral.com/1471 - 2105/14/119.

Yang K, Hou Y, Li K. Variable importance Measure of Random Forest and Its Progress, Public Health School, Harbin Medical University, Harbin 150081. [2023-02-27].

Downloads

Published

16-05-2023

How to Cite

Sun, X., Cai, W., & Li, M. (2023). Multi-factor Stock Forecasting Model Based on Independence Criterion Weighted Bagging Algorithm. Highlights in Business, Economics and Management, 12, 233-242. https://doi.org/10.54097/hbem.v12i.8358