Research on Forecasting the Shanghai and Shenzhen 300 Index Based on the ARIMA ‐ GARCH Model

: This paper aims to forecast the closing prices of the Shanghai and Shenzhen 300 Index using time series models. By combining the ARIMA and GARCH models, the study aims to improve the accuracy of short-term predictions for the index and enhance risk management capabilities. The research begins by collecting and preprocessing historical data of the Shanghai and Shenzhen 300 Index. Then, an ARIMA-GARCH model is established, and the model parameters are estimated and tested. Finally, the model is used to forecast the closing prices of the index, and the accuracy of the predictions is evaluated. The results demonstrate that the constructed ARIMA-GARCH model can provide accurate predictions for the closing prices of the Shanghai and Shenzhen 300 Index. This research is expected to enhance prediction accuracy, improve risk management, increase practical value, and serve as a reference for similar studies.


Introduction
Accurate short-term stock market predictions and risk assessment are vital in today's dynamic financial landscape. This study focuses on the practical application and effectiveness of the ARIMA-GARCH model in forecasting the Shanghai and Shenzhen 300 Index.
By combining the strengths of the ARIMA and GARCH models, this approach enhances the capture of market volatility and nonlinear characteristics. The study aims to enrich time series analysis methods in finance and provide a theoretical foundation for future research. In practical terms, accurate short-term forecasts of the index can assist investors in making informed decisions, optimizing portfolios, and mitigating risks. The volatility forecasts provided by the ARIMA-GARCH model enable effective risk assessment and control. Financial institutions can leverage these predictions to enhance asset allocation, improve investment decisions, regulate markets, protect investors, and promote overall financial market development.

Literature Review
Financial market forecasting has gained significant attention in both academic and practical domains, with the ARIMA-GARCH model being widely applied and studied, particularly in stock price forecasting.
In China, researchers have made notable advancements in utilizing time series analysis for financial data modeling. Zha Zhenghong introduced the ARIMA model to analyze the Shanghai Stock Composite Index in 1999 [1]. Hui Xiaofeng et al. applied the GARCH model to forecast the RMB-USD exchange rate [2]. Liu Yan et al. explored the integration of ARIMA and GARCH models by employing the ARIMA-GARCH model for spot electricity price prediction [3]. These domestic studies signify the progress in applying time series analysis to financial data forecasting in China.
Internationally, the ARIMA-GARCH model has been extensively studied and applied in stock price forecasting. Engle introduced the ARCH and GARCH models [4], while Bollerslev further developed the GARCH model [5]. Ding, Granger, and Engle proposed the ARIMA-GARCH model for modeling stock market returns [6], and Mohammadi et al. applied the ARIMA-GARCH model to forecast crude oil spot prices [7]. The model has also been used in other domains such as tourism forecasting by Chhorn [8] and network traffic prediction by Zhou [9]. These international studies highlight the effectiveness of the ARIMA-GARCH model in different markets.
Overall, the ARIMA-GARCH model has shown promise in financial forecasting, particularly in stock price prediction. The domestic research in China has contributed to the advancement of time series analysis for financial data modeling, while international studies have showcased the model's effectiveness in various markets. However, continuous improvement and development of more accurate and reliable prediction models are necessary to meet the evolving demands of the financial market.

Principle of the ARIMA Model
The ARIMA model, developed by Box and Jenkins, is a stochastic time series model known as the Box-Jenkins model. It uses autocorrelation and partial autocorrelation functions to analyze data randomness and make future trend predictions. Represented as ARIMA(p, d, q), the model adjusts parameters (p, d, q) to capture dataset characteristics and randomness. The ARIMA(p, d, q) model can be represented by the equation: Where c is a constant, p, d, q represents the autoregressive order, differencing order, and moving average order, respectively.
represents the time series at time t, and are the stationary polynomial operators of order p and q, respectively, and denotes the residual term.

Principle of the GARCH Model
In 1986, Bollerslev developed the GARCH model by incorporating lag order from the ARCH model. The GARCH model effectively addresses the issue of "fat-tailed" distribution in time series data, resulting in enhanced forecasting accuracy. It is widely used for analyzing financial data and making decisions related to asset allocation, hedging, risk management, and portfolio optimization. The GARCH(m, n) model can be represented as follows: (2) Subject to the following constraints:

Principle of the ARIMA-GARCH Model
The ARIMA-GARCH hybrid model combines the ARIMA model and the GARCH model and can be represented as follows: Where 0 ∑ ∑ 1 , represents the random error term, and and denote the orders of the GARCH model and the ARCH model, respectively. represents the conditional variance of , reflecting the volatility characteristics of the time series.

Research Method
This study utilizes the ARIMA-GARCH model to forecast short-term trends in the CSI 300 Index. The ARIMA model captures and predicts the trend component, while the GARCH model models and predicts the volatility component. By considering both trend and volatility characteristics, this combined model improves forecasting accuracy.

Experimental Steps
1. Data Collection and Preprocessing: Historical data of the CSI 300 Index is collected from official websites and undergoes cleaning and preprocessing, including handling missing values, outlier treatment, data smoothing, and normalization.
2. Building the ARIMA-GARCH Model for Stock Price Forecasting: The correlation between time series models is tested to examine the existence of autocorrelation among the data.
3. Stationarity Testing of Stock Price Time Series using ADF Unit Root Test.
4. Non-stationary series are differenced to transform the original stock price series into a stationary series.
5. Model order is determined and parameter estimation is conducted using the AIC criterion.
6. ARCH test is performed on the selected ARIMA model to determine the presence of heteroscedasticity. If heteroscedasticity is detected, a GARCH model needs to be established. 7. Residuals of the developed ARIMA-GARCH model are tested for a white noise process. If the criteria are not met, further model improvements are needed.
8. Using the established ARIMA-GARCH model, the closing prices of the CSI 300 Index are forecasted, and the relative errors between predicted and actual values are calculated. Accuracy is evaluated and compared with the ARIMA model. 9. Using the finalized model, the closing prices of the CSI 300 Index are predicted, and the relative errors between the predicted values and the actual values are calculated.

Data Source
This article is based on the closing price data of the CSI 300 Index (000300) from February 9, 2015, to April 28, 2023. A stock price forecasting model is established using this data. For testing purposes, the closing prices for a total of 20 days from May 4, 2023, to May 31, 2023, are selected as the test data for predicting the closing prices.

Data Preprocessing
The results of the Durbin-Watson test are shown in Table 1. The DW statistic is 1.9167, and the p-value is 0.02838. Based on the range of the DW statistic, a value close to 2 implies the possible presence of positive autocorrelation. Since the p-value is less than the significance level (usually 0.05), we can reject the null hypothesis of zero autocorrelation and accept the alternative hypothesis of positive autocorrelation. Overall, based on the results of the DW test, it suggests that this time series data may exhibit positive autocorrelation, indicating that the current observations may be influenced by lagged observations.

Stationarity Test
After the stationarity test, the time series was evaluated using the ADF test. The ADF test is a commonly used method to determine the stationarity of a time series. If a nonstationary time series has unit roots, differencing can be used to eliminate the unit roots and obtain a stationary series. The results of the ADF test are shown in Table 2. The Dickey-Fuller statistic value is -2.2951 (p-value = 0.4534) with a lag order of 12. The ADF test was conducted assuming the series is stationary. Based on the test results and the p-value, the following conclusions can be made: The Dickey-Fuller statistic value is smaller than the critical value (-2.2951 < -2.885), indicating insufficient evidence to reject the null hypothesis of unit roots in the time series. Moreover, the p-value (0.4534) is greater than the significance level (usually 0.05), indicating that the series is non-stationary. To address this, differencing was applied to the series. The adfTest function confirmed that the first-order differenced series has a p-value of 0.01, which is lower than the significance level of 0.05, indicating that the differenced series is stationary.

Model Selection
The auto.arima function is used to automatically select the order of the ARIMA model based on the time series data and provides suggested model orders. According to the output, the suggested model order is ARIMA (2,1,3), where 2 represents the autoregressive (AR) order, 1 represents the differencing order, and 3 represents the moving average (MA) order.
Furthermore, the AIC criterion is used to determine the model order. A lower AIC value indicates a better fit of the model. The AIC values for models such as ARIMA(0,1,1), ARIMA (1,1,0), and ARIMA(1,1,1) are calculated and the results are as follows:  Table 4 reveal a p-value significantly smaller than 2.2×10 -16 , rejecting the hypothesis that the squared residual sequence is white noise. This indicates the need to incorporate a conditional heteroscedasticity model, such as the GARCH model. Considering the potential instability introduced by higher order GARCH models, this study selects GARCH(1,1) to capture the residual dynamics.

Residual Test
The LM test was conducted on the obtained ARIMA-GARCH model to examine the independence of residuals. The results are shown in Table 6.

Forecast Results
Using the established ARIMA(2,1,3)-GARCH(1,1) model, the closing prices of the CSI 300 index are predicted for the next 20 days. The results are shown in Table 7. On the other hand, using only the ARIMA(2,1,3) model to predict the closing prices of the CSI 300 Index for 20 days yields the results shown in Table 8. The MAPE for the given data is approximately 2.4833%. By comparing the results, we can see that the ARIMA-GARCH model achieves higher accuracy in predicting the closing prices of the CSI 300 index compared to using the ARIMA model alone. This indicates that the trained model is capable of effectively forecasting the closing prices of the CSI 300 index, highlighting the potential application of the ARIMA-GARCH model in financial data prediction.

Conclusion
This study utilized ARIMA-GARCH modeling on the CSI 300 index time series data and performed model evaluation and prediction analysis. The Durbin-Watson test revealed positive autocorrelation, while the ADF test indicated possible non-stationarity of the series. The ARIMA(2,1,3) model was selected for forecasting after model selection. Both the white noise test and LM test confirmed that the residual sequence of the model met the assumptions and required no further adjustments. By predicting the test data, we obtained forecasted closing prices, evaluated their accuracy, and compared them with the ARIMA model. The comprehensive findings demonstrate the ARIMA-GARCH model's accuracy and feasibility in short-term forecasting, providing investors with valuable reference and decisionmaking guidance.