Prediction of PM2.5 Concentration Based on CNN ‐ BiGRU Model

: The issue of air pollution has always been a concern. Bad smog weather not only brings inconvenience to people's travel, but also poses a threat to people's health. PM2.5 concentration is an important indicator of air conditions. Therefore, it is of long-term significance to analyze and predict the concentration of PM2.5. Aiming at the problem that a single machine learning model cannot consider the impact of multiple factors on PM2.5 concentration changes, and the data characteristics are complex, which cannot better capture all the characteristics of the data, and cannot highlight the regularity of PM2.5 changes over time, the construction of a combined model further improves the prediction accuracy. Firstly, based on the PM2.5 concentration values, air quality data, and meteorological data at various stations in New Taipei City, Taiwan Province, through analyzing the spatiotemporal distribution characteristics of the PM2.5 concentration at the target station, as well as the correlation with various pollutant factors and meteorological factors, Spearman correlation analysis is used for feature selection. The combined model CNN-BiGRU constructed in this paper utilizes its unique convolution operation to extract features from one-dimensional data, and combines the circular neural network BiGRU with bidirectional transmission function to model and predict PM2.5 concentration based on the functional advantages of both parties.


Introduction
With the acceleration of China's industrialization process, the accompanying air pollution problem is becoming increasingly serious. Poor air quality can affect our daily lives and even our physical and mental health, with PM2.5 causing the most serious harm to the human body [1,2]. Accurate and timely prediction of future PM2.5 concentrations can provide assistance for people's defense.
In recent decades, many scholars have analyzed and studied the formation, diffusion, and concentration prediction of PM2.5. In previous years, most of the research on PM2.5 used traditional prediction methods, mainly based on mechanism models, which dynamically modeled the transmission and formation of PM2.5 based on the geographical environment, meteorological conditions, and even industrial level of a region. And most of them use CAMQ models [3][4][5][6].For example, Lee et al [7] used the MM5-SMOKE-CMAQ modeling system to respectively generate meteorological fields, prepare emissions, and simulate air quality; Jiang et al [8] used a community multiscale air quality (CMAQ) simulation model to predict the daily average concentration of fine particulate matter (PM2.5). Another prediction method is to establish regression models. For example, Wang Weirud et al [9] used the Box-Jenkins theory to establish an ARIMA model to predict the PM2.5 concentration in Hangzhou, and found the optimal parameter value with good prediction accuracy; Wang Juan [10] used multiple regression analysis and gray correlation 2 degree method to conduct air pollution quality research in most areas of the city; Wang Jing et al [11]proposed a hierarchical autoregressive model based on Bayesian to handle the synchronous prediction of PM2.5; Most of these traditional PM2.5 concentration modeling methods require complex calculations and have a problem of low accuracy.
In recent years, machine learning algorithms have become a research craze. AmA et al [12]established two different models, support vector machine SVM and artificial neural network ANN. The results show that the prediction accuracy of neural network for PM2.5 is better than that of support vector machine; Chen et al [13] established a random forest model and two traditional regression models to estimate ground PM2.5 concentrations. The results show that the prediction accuracy of the random forest model is significantly higher than that of the two traditional regression models; Joharestani et al [14]studied the importance of the characteristics of PM2.5 prediction in urban Tehran, and constructed random forests, limit gradient lifting, and deep learning machine learning methods to predict it. Liang Xiguan [15], Kong Yu, et al [16] established a tree model to predict PM2.5, and obtained its strong stability compared to other model tree models. These machine learning models have achieved good results in the prediction of PM2.5 concentration. However, for the single model is not accurate in the prediction of PM2.5 concentration, and cannot better capture the characteristics of multidimensional data changes, the combined model can further improve the prediction accuracy. For example, Ding et al [17] combined convolutional neural networks (CNN) with short-term and short-term memory (LSTM) to achieve better prediction results than other machine learning methods; Qian et al [18] used a generalized additive model to combine neural networks, random forests, and gradient enhanced PM2.5 estimates; Niu et al [19] used empirical mode decomposition method, gray wolf optimization algorithm, and SVR model to achieve high-precision prediction of PM2.5. These experiments all use a single model for comparative verification. Therefore, it is proved that the combined model can improve the prediction accuracy to a certain extent. In this paper, a CNN-BiGRU model was established to analyze and predict PM2.5. Its unique convolution operation was used to extract features from one-dimensional data. At the same time, the circular neural network BiGRU with bidirectional transmission function was combined to model and predict PM2.5 concentration based on the functional advantages of both sides.

CNN
Convolutional Neural Networks is one of the most important networks in the field of deep learning. It is a deep learning model or multilayer perceptron similar to artificial neural networks, and is often used in analyzing visual images (such as face recognition, image segmentation, and image classification) and natural language processing. CNN is a feedforward neural network that can extract features from data, with local connectivity, weight sharing, and other characteristics. A convolutional neural network is mainly composed of five layers: data input layer, convolutional layer, pooling layer, fully connected layer, and output layer [20].

BiGRU
BiGRU (Bidirectional Gate Recurrent Unit) is an extended model of GRU, which consists of two GRU models: a forward transitive GRU model that receives forward inputs; The other is the reverse transmission GRU model, which learns the reverse input of the model [21,22]. As the PM2.5 concentration value studied in this article belongs to time series data, which is greatly affected by the chronological order, the BiGRU model is selected to model it, which can not only learn the impact of historical information on subsequent information, but also ensure the correlation between historical information and subsequent information, thereby improving the prediction accuracy of PM2.5 concentration. The structure diagram of BiGRU module 3 is shown in the figure 1.  [23,24]. Therefore, in order to build an effective model and improve prediction accuracy, it is necessary to establish a reasonable input characteristic matrix. A good representation of the relationship between input data and PM2.5 is a key part of building a prediction model.
Feature selection can effectively improve the accuracy of the model, reduce runtime, and reduce the risk of overfitting by reducing redundant and unrelated features. From the correlation analysis in Chapter 3, it can be seen that most pollution factors and meteorological factors are significantly related to the increase and decrease of PM2.5 concentration.
In this paper, Spearman correlation coefficient is selected for feature selection [25]. Take the monitoring point in Xindian District as an example. Figure 2 and Figure 3 show the correlation between PM2.5 and pollutants and the correlation between meteorological factors. You can clearly see the Spearman correlation between each feature and PM2.5. The darker the color, the stronger the correlation. The correlation between PM10 and PM2.5 in pollutants reaches 0.83, with a strong correlation, followed by CO, while the correlation between meteorological factors and PM2.5 is weak, and most characteristic variables are inversely proportional to them, such as RH (relative humidity) and AMB_TEMP (temperature), etc. Therefore, PM10, CO, NO2, and SO2 with strong correlation are selected as pollutant factors, while RH and WS_HR are selected as meteorological factors, which are used as a correlation factor auxiliary variable.

Data preprocessing
Missing data will lead to the reduction of sample information, deviation of data analysis results, and "pseudo regression" of data outliers. To solve these problems, this paper adopts the following measures: Data missing is filled with the nearest Neighbor (KNN) algorithm. The filling idea of KNN considers the "distance" between two samples, and selects the average value or distance weighting of the nearest several observations as the filling value of the missing samples.
Outlier refers to the index data containing incorrect input and illegal data, that is, the parameter value exceeds the normal range. The box diagram method is used to treat values less than QL-1.5IQR or greater than QU+1.5IQR as exceptions. QU and QL are the upper quartile and the lower quartile respectively, and IQR is the spacing of the quartiles. The detected outliers are regarded as missing values and filled with KNN filling method.
The data range of different features is different, and the difference between values may be large, so the data are standardized. Standard deviation is used to standardize data. The mean value and standard deviation of data processed by this method are 0 and 1 respectively. The transformation formula is shown in Equation (20): is the mean value of the original data, and δ is the standard deviation of the original data.

Model construction
The hybrid CNN-BiGRU model constructed in this paper first uses one-dimensional CNN to mine the nonlinear characteristics of data, improve the running speed, and also eliminate some unstable factors. Then, its output is used as the input of the next time series model, BiGRU, to extract hidden temporal rules, which can capture long-term and short-term dependencies in time series, And feature vectors can be extracted through forward and backward transmission. This model combines the advantages of CNN in learning local and related features with the advantages of learning temporal rule representations from BiGRU, further improving the prediction accuracy of PM2.5 concentrations. In this study, there are eight data features used. The input is the pollutant concentration data for the past hour, and the output is the concentration value of PM2.5 for the next hour. That is, the input is a matrix of (1 × 8) size, and the output matrix is (1 × 1).
The construction of CNN-BiGRU model is mainly divided into two parts, the structural design of one-dimensional CNN and the structural design of BiGRU model. The input data input sample constructed in this paper is divided into data segments of a certain length. Using one-dimensional CNN can effectively extract local features. Since this article uses historical 1-hour data of various pollutants and meteorological factors to predict the PM2.5 concentration in the next hour, the design of the network layer should not be complex. Based on previous experience, a layer of convolution layer and pooling layer has been designed to avoid increasing the calculation amount.
The structural design of the CNN-BiGRU model is also based on past experience. It mainly consists of the following parts: input layer, convolution layer, pooling layer, BiGRU layer, full connection layer, and output layer. The input layer is mainly used for data preparation. The convolution layer extracts features, and the pooling layer is used for data sampling. Generally, the convolution layer and pooling layer of a convolutional neural network are nested and used to extract and compress temporal dimension features. The convolution layer transmits the extracted data information to the bi-directional circular neural network BiGRU layer, which performs forward and reverse transmission of data, followed by a fully connected layer. The model structure diagram is shown in Figure 4.

Evaluation criteria
In this paper, root mean square error (RMSE), mean absolute error (MAE) and determination coefficient are selected to evaluate the prediction performance of the model. Its formula is defined as follows: The smaller the RMSE and MAE values are, the better the effect of the model will be. The larger the value is, the better the effect of the model will be.

Result analysis
From the scatter diagram Figure 5 (a) (b) (c), it can be observed that the values of the three monitoring points are 0.9111, 0.8890, and 0.8768, respectively, which are greater than 0.85, which can prove that the fitting effect of the combined model at the three selected stations is very good. The univariate linear fitting function for the monitoring points in Xindian District is: y=0.9318x+1.3457, with a slope of 0.9318, which is closer to 1. Overall, the actual value and predicted value are relatively close. The slopes of the fitting equations for Taoyuan District and Songshan District are 0.87 and 0.88, and their data fitting is also relatively concentrated. Figure 6 (a) (b) (c) show a broken line diagram of the actual and predicted values of the three stations, and selects the PM2.5 concentration values for the first 200 hours. It can be seen that when the PM2.5 concentration value is less than 5 μg/ , the fitting degree of the predicted value of the model is low, and when the PM2.5 concentration value is between 10-25 μg/ , the fitting effect is good. Therefore, it can be concluded that the CNN-BiGRU combined model has a relatively good effect in predicting PM2.5 concentration.  Figure 6(a)(b) and (c) are the loss iteration diagrams for the LSTM model, CNN-GRU model, and CNN-BiGRU model, respectively. The evaluation index selects the MAE graph, with the ordinate Loss representing the loss value, and the abscissa Epoch representing the number of iterations. The Epoch for all models is set to 60. As can be observed in the figure, the loss rates of the three models all decreased rapidly during the first few Epochs, and then decreased slightly. When the number of iterations is 5, it can be observed that the loss value of the LSTM model is about 2.9, the loss value of the CNN-GRU model is about 3, and the training set loss value and test set loss value of the CNN-BiGRU model are both less than 3. It can be concluded that the model used in this article, CNN-BiGRU, can achieve the minimum loss value faster. Continue to observe that the LSTM's Loss value in Figure 6 (a) reaches the minimum value when Epoch is approximately 45 times, while the CNN-GRU model in Figure 6 (b) reaches the minimum value when Epoch is 50 times, and the number of iterations for the CNN-BiGRU model reaches the minimum value when the loss reaches the minimum value is approximately 30.   The of the CNN-GRU model is 0.9100, 0.8876, and 0.8692, respectively, and the prediction accuracy of the proposed combined model has been improved. Its RMSE can also be calculated as a percentage of its improvement.It can be concluded that this model has a significant improvement compared to LSTM, especially for monitoring points in the new store area. Compared to the combined module CNN-GRU, the improvement is not significant, but there is also progress, because CNN-GRU also combines the ability of CNN to mine local features of data and the ability of GRU to process long-term and long-term memory. Therefore, it can be concluded that the combination model is better than the single model, and the proposed combination model has good prediction performance in most cases. Figure 9. MAE diagram and RMSE diagram of each model Data from 10 monitoring stations in Xinbei City were selected, and the average values of each characteristic were taken to predict the PM2.5 concentration value in Xinbei City. First, construct input features, select relevant feature variables, and then adjust parameters. Input the constructed CNN-BiGRU combination model to predict the PM2.5 concentration in Xinbei City, and also select LSTM, GRU, and CNN-GRU models as comparison models. Figure 9 shows the histogram of the evaluation indicators for predicting PM2.5 concentration in Xinbei City by all models. The evaluation indicators are MAE and RMSE. It can be observed that compared to using data from a single station to predict, the value of RMSE is higher, which is due to differences in pollutants and meteorological conditions across different regions, showing strong regional characteristics. However, in the MAE and RMSE diagrams of each model, the value of CNN-BiGRU is the lowest, followed by the CNN-GRU model, and the error value of the two single models is still the highest. Therefore, it can be concluded that the effect of this combined model in testing the PM2.5 concentration in the entire New Taipei City is still significant.
In summary, the CNN-BiGRU combination model used in this article can fully utilize the advantages of convolutional neural networks and two-way gated cyclic unit networks. CNN with strong feature extraction capabilities will input output features into the BiGRU network, thereby avoiding the singularity of single model prediction, and improving the prediction ability.

Conclusion
In this paper, we conducted a predictive analysis of PM2.5 concentration data from three monitoring stations in Xinbei City, Taiwan Province. First, we selected characteristic variables with strong correlation with them as predictive auxiliary factors, and then conducted modeling and prediction of PM2.5 concentration. We analyzed three evaluation indicators, and designed three models for comparison: LSTM model, GRU model, and CNN-GRU model. It was concluded that the prediction performance of the CNN-BiGRU model at the three target stations was good, Finally, a prediction analysis was conducted on the average PM2.5 concentration in Xinbei City, further verifying that the prediction effect of the combined model is better than that of a single model, and that the CNN-BiGRU model has more advantages in terms of measurement performance.