Prediction Model of Physical Goods Sales based on Time Series Analysis

: With the advent of the era of big data, most online historical consumption data are collected and simply stored in the cloud, but they are not receiving enough attention. This paper takes "online sales of physical goods" as the research object, uses "time series analysis related theory and sales prediction theory" to study and analyze "data from January 2020 to December 2021", and finally uses EViews software, using ARMA model for modeling and prediction. Establishing a mathematical model to effectively analyze and predict the "online physical goods sales ability", to a certain extent, has a practical guiding significance for e-commerce enterprises to make the optimal business decisions.


Introduction
According to the 49th Statistical Report on the Development of The Internet in China, as of December 2021, the number of online shopping users in China had reached 842 million. As a typical representative of the new business form of the digital economy, online retail still maintained a fast growth rate. In 2021, online retail sales of physical goods accounted for 24.5 percent of the total retail sales of consumer goods [1]. Therefore, how to effectively develop and use the existing network consumer consumption data will play a key role in the operation and decision of many e-commerce enterprises in China.
Demand analysis and prediction of network consumers, as the links that are paying more and more attention to by ecommerce enterprises, play an increasingly important role in providing a good decision-making basis for e-commerce enterprises. This article to "physical goods online sales" as the research object, using "time series analysis related theory and sales forecast theory", the "January 2020-December 2021 data" research analysis and establish a model, so as to get the forecast results, to help electricity enterprises achieve accurate production, reduce inventory costs and other costs, and obtain greater profit space.

Research Status
On the demand analysis of network consumers, scholars have explored from different aspects. Zuo Wenming, Chen Shao et al. (2019) analyzed the complex behavioral decisionmaking process and influencing factors of online consumers in B2C activities based on prospect theory and fuzzy multiattribute decision-making [2]. Dong Yan, Time et al. (2020), based on consumer behavior and marketing theory, analyze the influence mechanism of online marketing activities on online consumer purchasing behavior from a psychological perspective [3]. Kansra P, Oberoi S (2022) used a questionnaire to analyze online purchases among young people [4]。 With the growth of online shopping users, it is of great significance to use the consumption data left on the Internet to predict the purchase demand of consumers for enterprises to achieve precision marketing. Zhang Jingxuan, Zhang Weiwei (2021), predicts the consumer repurchase behavior based on the neural network [5]. Li Weiqing, Chi Maomao et al. (2021) to predict the preferences of online consumers from four perceptual dimensions [6]. Dong J, Huang T, et al. (2022) analyze historical user data in promotions based on the B ERT-MLP prediction model to predict repeated purchases of consumers after the promotion [7]. Rajendrakumar R (2022) obtained the data by simple random sampling and ensured the model fitting using SEM [8]。 The analysis and prediction of time-series models are applied in various scenarios. Wang Xue (2019) applied the time-series prediction model to high-level discipline inquiry [9]. Yan Jinghua et al. (2020) applied the time series prediction to theft crimes to adjust the community patrol strategy [10]. Wang Zhigang et al. (2020) proposed a prediction model based on the plan review technology (PERT), and created a combined prediction model combined with the time series analysis model, which has good results in predicting the sales volume of supermarket chains [11].
Ke Miao et al. (2021) summarized the commodity sales forecast into multivariable time series forecast problems, and used the historical sales data of an e-commerce online store to build LSTM and network model under the TensorFlow framework. The research results have important guiding significance for e-commerce enterprises to improve marketing decisions and reasonable inventory management [12].

Theoretical Basis
Based on the above literature review, this paper adopts the combination of sales prediction theory and time-series model to analyze the historical data of online sales of physical goods, so as to predict the purchase behavior of consumers thereafter. Sales forecast refers to making an estimate of the quantity and amount of a commodity sold at a certain time in the future. After multiple access, from the " National Bureau of Statistics (website: https: / / data.stats.gov. The cn /) "official website gets data for" online sales of physical goods ". This paper takes "online sales of physical goods" as the research object, takes "cumulative online retail sales from January 2020 to December 2021 and cumulative online retail sales of physical goods" as the sample data (Table 1), and the data is the time series data. Note: The physical goods here stand against the virtual goods on the network, that is, the sum total of the morphological and real goods sold on each platform on the network. Table 1. Initial data of online sales of physical goods

Outliers Handling Method
The causes of outliers are: weather effects, human error, and unscientific means of collection.
Elimination method: if a value violates the Raida criterion, that is, if the experimental data x is subject to a normal score P(|x μ| 3σ) 0.003μσThen, if the data and data are the mean and standard deviation of the experimental data respectively, the data for or should be removed as outliers.x μ 3σx μ 3σ To replace the outliers with the data close to the real value, this is the substitution method. In general, the discrete-type data Mass or median substitution are available, while continuous data can be replaced with means.

Outlier Handling Method Used in This Article
The data in this paper are collected from real data, so outliers may also exist and require outliers processing. This article removes outliers. Given that the outliers data in this paper are not large, using this method. In this paper for the real data, online physical goods sales into the original time series for research problems, to establish the monthly cycle time series model, on the basis of the cumulative value of online sales and physical goods online sales cumulative value two dimensions for comparative analysis, through modeling prediction, on the line of physical goods demand forecast. Raw data are shown in Tables 2.
After the observation that the data for January 2020 and January 2022 were blank states, and the data for other time periods were normal, the data from January 2020 and January 2022 / 2022 were excluded. The time series effect after handling the outliers was better improved. The processed data are shown in Table 2.and the processed time series plots are shown in Figures1.2.   Figures 1.2. the trend of the timing chart obtained after data processing changes significantly. The cumulative value of online retail sales shows the same pattern as the cumulative value of online retail sales of physical goods, and the data is in line with the actual situation.

Time Series Stability Judgment of The Cumulative
Value of Online Retail Sales This paper judges the stability of the time series by observing the correlation graph of the time series. The specific judgment method is as follows: drawing the autocorrelation and partial autocorrelation graph. If the correlation graph of the sequence shows an approximate linear decay, it shows that it is a non-stationary sequence, and if the correlation graph shows an exponential function or a sinusoidal function to decay, it is a stationary sequence. As can be seen from Figure 3. the time series of the cumulative value of online retail sales is stable, so there is no need to use the difference method to stabilize the sequence.  As can be seen from Figure 4.the time series of the cumulative value of online retail sales is stable, so there is no need to use the difference method to stabilize the sequence.

Empirical Analysis
This paper is for the acquisition value of online physical goods sales data, which is taken as the observed value in the time series analysis model, where the time index is measured in months. And through modeling prediction and analysis, compare the benefit and effect of modeling. This part is a time-series prediction model built by the EViews software. Time-series analysis of the monthly cycle selected data from February 2020 to December 2021 to forecast online physical goods sales in November 2020.

Forecast and Modeling Process of Online Sales of Physical Goods
The modeling process of online physical goods sales forecast is shown in Figure 5.   2.By observing acf (autocorrelation) and pacf (partial autocorrelation), acf shows obvious tail, see Figure 6. while pacf drops rapidly at order 2, see Figure 7.which means tail cut at order 2. The dataset was then paired and AR (2) model was used for the data set.
3.Model parameter estimation. The online sales data of physical goods is defined as variable X, and the parameters of the obtained model AR (2) were estimated using EViews software, resulting in Figure 8. According to the picture, the model is expressed as follows: xt=43742.89+0.747190xt-1-0.096047xt-2+μt,μt~N(0,σ2) (1) According to the test of the parameters, it is found that the AR (1) model is more applicable, so the AR (1) model is finally selected.

5.2 Forecast and analysis of online physical goods sales
Forecast the sales data for February 2022 based on the data from January 2020-December 2021, and finally compare it with the real data, to verify the accuracy of the forecast.The predicted results are shown in Table 4-1 and Figure 9.  After the prediction, the forecast data for February 2022 is obtained, and the true value is 87792, and the predicted value is 86299.39. Thus, the deviation between the forecast data and real data is very small, so the electricity enterprise can adopt the method of prediction for the company subsequent management and business decisions, develop network marketing strategy, to help electric business enterprises to achieve accurate production, reduce the inventory cost caused by product backlog and other costs, and obtain greater profit space.

Summary
By analyzing and combining the prediction model of online physical commodity sales, this paper obtains the following conclusions, hoping to provide technical support for e-commerce enterprises to develop marketing strategies.
The deviation between forecast data and real data is very small, so the electricity enterprise can adopt the method of prediction for the subsequent management and business decisions, develop network marketing strategy, to help electric business enterprises to achieve accurate production, reduce the product backlog of inventory cost and other costs, and obtain greater profit space.
Therefore, e-commerce enterprises can adopt appropriate prediction methods to develop more accurate marketing strategies when their own capabilities allow.
In the research process of this paper, due to the limitation of data access channels and its own limited research ability, the research content of this paper is not very perfect, and the selection of specific platforms and categories for modeling and prediction will get better results. After that, the research and analysis of the online physical commodity sales data can increase the analysis of the platform and category dimensions, such as studying the influencing factors of JD's Xiaomi mobile phone demand, so as to make the problem research more specific and accurate.