Prediction of Carbon Emission Right Price Based on XGBoost Algorithm

: Reasonable carbon market price prediction can facilitate the carbon market participants, such as physical producers, to achieve the goal of efficient emission reduction through the market mechanism. In this paper, we use XGBoost, an integrated algorithm in machine learning, to forecast the domestic carbon price from 2013 to 2021. Pearson coefficient is utilized to calculate the correlation of data features, and perform PCA dimensionality reduction on the features with high correlation coefficients. Before using PCA to reduce the dimensions, in order to make the feature data more suitable for later model training, the feature data is standardized. After the standardization, PCA selects super parameters with maximum likelihood estimation and outputs the dimension reduction results of features. Finally, the integrated algorithm XGBoost is used to form a prediction model for the carbon price. RMSE output from cross validation are used to evaluate the accuracy and error of the prediction results. The results show that this paper confirms that the integrated algorithm XGBoost model has a good prediction ability for carbon price, and provides a novel idea for the field of carbon emission price prediction. It is expected to provide some rational basis for the carbon market participants to make investment decisions, so as to avoid the carbon market risk caused by the change of carbon price.


Introduction
The introduction should provide background information (iCarbon emission trading market (carbon market ) is an important policy tool for the international community to address climate change. Excessive emission of greenhouse gas, mainly carbon dioxide, has led to a gradual rise in global temperature, and an over-focus on reducing carbon dioxide emissions will certainly bring high treatment costs. Quantifying carbon emissions means that the government provides carbon emission quotas to enterprises, and enterprises with surplus carbon emissions can transfer the surplus part to enterprises with surplus carbon emissions through trading to achieve optimal allocation of resources and achieve emission reduction target at a lower cost.
In June 1992, the United Nations Framework Convention on Climate Change encouraged developed countries to reduce carbon dioxide emissions and set the ultimate goal of "stabilizing the concentration of atmospheric greenhouse gases at a level that would prevent dangerous anthropogenic interference with the climate system. This level should be achieved within a time frame sufficient to make the ecosystem sustainable." In December 1997, the Third Conference of the Parties was held in Kyoto, Japan. The signing of the Kyoto Protocol forced major industrialized countries to reduce their carbon dioxide emissions from 2008 to 2012 by 5.2% from 1990. With the Kyoto Protocol and the provision of three flexible performance mechanisms, namely Joint Implementation Mechanism (JI), Clean Development Mechanism (CDM) and International Emissions Trading (IET), these three flexible mechanisms have given birth to the international carbon emission trading market. The EU established the EU Emission Trading System (EU-ETS) in April 2005, and has made remarkable achievements in environmental and economic benefits, becoming the most influential global carbon emission trading market in the world.
In October 2011, the Notice of the General Office of the National Development and Reform Commission on the Pilot Project of Carbon Emission Trading identified that Beijing, Tianjin, Shanghai, Chongqing, Hubei, Guangdong and Shenzhen should carry out the pilot work of carbon emission trading. In 2019, China's carbon dioxide emissions reached 9.826 billion tons, of which 80% will come from coal combustion, and the carbon dioxide emissions of power, industry and transportation industries will account for 90% of the total emissions. At the 75th United Nations General Assembly in 2020, China pledged d to peak its carbon dioxide emissions by 2030 and strive to achieve carbon neutrality by 2060. At the National People's Congress and the Chinese People's Political Consultative Conference in 2021, Premier Li Keqiang pointed out that efforts should be made to achieve the vision of "carbon neutrality". On July 16, 2021, the national carbon market officially started online trading. On March 15, 2022, the Ministry of Ecology and Environment issued the Notice on Key Work Related to the Management of Enterprises' Greenhouse Gas Emission Reports in 2022, which established the list of key emission units in the power generation industry in the second compliance cycle of the national carbon market. According to the verification results, the annual carbon emissions in 2020 or 2021 will reach 26,000 tons of carbon dioxide equivalent, It also has key emission units in the power generation industry that meet the quota management standards for units, and is included in the list of key emission units under quota management in the national carbon emission trading market in 2022.
The research on carbon trading market price prediction has focused on two main points: one is to use only the historical data of carbon trading market price to predict carbon price [1][2][3][4][5], and the other is to predict carbon price by combining the influencing factors of carbon trading market price [6][7][8][9][10]. However, since there are few articles that apply machine learning to carbon price prediction in China. Based on this, this paper will use XGBoost model to predict the carbon price,which will provide new ideas for the carbon price prediction field, and provide some practical basis for the carbon market participants to make investment decisions, so as to avoid the carbon market risk caused by the carbon price change.

XGBoost algorithm principle
Extreme Gradient Boosting (XGboost) is an end-to-end gradient lifting tree system proposed by Dr. Tianqi Chen of Washington University. XGBoost implements a generalized Tree Boosting algorithm. One representative of this algorithm is the Gradient Boosting Decision Tree (GBDT). XGBoost adopts the idea of integration, adds regular terms on the basis of GBDT, and extends the loss function from square loss to Taylor's second-order loss.
The following article will briefly describe the principle of XGBoost algorithm: ) (

Boosting additive model representation
Where t represents the t-th tree to be optimized, t-1 represents the t-1 tree that has been optimized, and ) ( x f i t represents the t-th tree to be optimized. objective function: is a regular term (adding the complexity of t regression trees). The purpose of adding a regular term is to prevent over fitting.
Optimization objective: Set T leaf nodes, determine their node values, and minimize the loss function.

Processing of regular terms
Where,  、 is a super parameter and a penalty item for variables. T is equal to leaf nodes of the regression tree. When the number of leaf nodes T of the regression tree is larger, the depth of the tree is deeper, and the over fitting phenomenon is more likely to occur. Therefore, T should be punished, and the punishment intensity should be controlled by  ; When   T j w 1 j 2 is larger, that is, when the node value is larger, the proportion of the node value of this regression tree in all regression trees is too large. At this time, the phenomenon of over fitting also occurs. Therefore,   T j w 1 j 2 is punished, and the punishment is determined by  .

Treatment of
The regular term in the objective function is decomposed into the optimized first t-1 term (Constant) and the term to be optimized. In the optimization process of the objective function, that is, the minimization process, constant is not required to be considered: Assume that the sample loss is the square error loss: Take the minimum loss calculation of node 1 as an example: Where y is a real value and a constant. Minimum W can be obtained: That is, when W1=y3+y5, the loss of L-node 1 is the minimum. (The same with W2, W3 and W4) Review of objective function and regular term formula: The regular term is calculated by leaf node j. At this time, in order to ensure the consistency of the calculation method before and after the formula, the following transformation is made: Substituted into the objective function: In order to facilitate the calculation of node value W, the objective function is divided into unified leaf node calculation.
The above assumption is that the sample loss is a square error loss, but the loss functions corresponding to different loss scenarios are different, and it is likely that the loss function cannot be changed into a polynomial containing W 2 for solution, so Taylor second-order processing is performed on the loss function ) , ，Where, Wj is the unique variable, and other items are constants. These constants need not be considered in the optimization of loss function, and the first step and the second step are g i and h i respectively: In each leaf node, the sum of a step degree g i of all samples is called Gj; In each leaf node, the sum of two steps hi of all samples is called Hj.
To sum up, the objective function is obtained: At this time, the optimal solution can be obtained: Substitute the optimal solution into the objective function to further obtain the optimal objective function:

Variable Design and Descriptive Statistics of Samples
The data in this paper comes from the database of China Research Data Service Platform (CNRDS), which covers the daily data of carbon emission trading information in several provinces and cities (Beijing, Fujian, Guangdong, Hubei, Shanghai, Shenzhen, Tianjin, Chongqing) from August 5, 2013 to October 8, 2021. The data indicators include: trading date, city name, trade variety abbreviation, opening price, highest price, lowest price, average transaction price (yuan/ton) The closing price, trading volume (tons), and turnover were 10615 data samples. (Table 1)

Research Method Design
In order to improve the prediction accuracy of carbon emission right price and the robustness of the model, the modeling in this paper is divided into the following three steps: ① divide the data set, take the first 10508 groups of data as the training set, and the last 107 groups of data as the test set, and predict the closing price of the last 107 groups of data. ② The Pearson correlation coefficient is utilized to analyze the correlation degree between features. In order to ensure that the data fit the model better, the data is standardized, and the features with high correlation coefficient are reduced by the MLE of PCA. ③ Building XGBoost model to make an initial prediction on data. Use Cross Validate and Bayes optimization to optimize XGBoost model and get the final prediction results. Finally, MAPE indicators are used to evaluate the robustness of the prediction model, and RMSE and MAE indicators are used to evaluate the accuracy of the prediction results.

EDA Analysis
In order to better fit the model training of features, this paper uses Pearson correlation coefficient index to analyze the correlation degree of feature variables (Table 2) ( Figure  1).  In this paper, the P-value between features is positively correlated with the correlation coefficient, so the P-value matrix will not be described separately. It can be seen more clearly from the thermodynamic chart that the opening price, the highest price, the lowest price, and the average transaction price (yuan/ton) are highly linear (0.7 ≤| r |<1), and the transaction volume is highly linear (0.7 ≤ | r |<1) with the transaction amount.
Secondly, establish the relationship between the characteristics and the target closing price (Figure 2).

Figure2. Relationship between characteristics and closing price
As shown in Figure 2, the opening price, the highest price, the lowest price, and the average transaction price under the similar slope have a positive correlation with the closing price. There is no obvious linear relationship between the trading volume, the transaction volume, and the closing price.
Based on EDA analysis, the correlation between features is relatively high. Although there are no obvious outliers, the possibility cannot be excluded . It is an indispensable application link of machine learning to solve data anomaly and data preprocessing behavior with extreme feature correlation. Therefore, feature engineering is also needed for data before building models.

Feature Engineering and XBGoost Model Building
This experiment is performed on a computer with Intel i5-7300HQ 2.5GHz CPU, 8 GB RAM and Windows10 operating system. Under the Anaconda environment, Python language is employed for programming. The IDE is Jupyter Notebook and GPU acceleration is not used.
The optimization process of the model takes the form of seeking to minimize the loss function, and so is Bayes optimization. Generally, there are several ideas to find the minimum value of a function f (x) within the value range of x: let the first derivative of the function be 0, and determine the minimum value of f (x) by solving the value of x at this time, but this method requires that f (x) is differentiable and the differential equation has solutions; The minimum value of f (x) can also be obtained by gradient descent and other optimization methods, but this method requires that f (x) is differentiable and convex; Another method is to substitute x into f (x) to obtain all possible values of the function and then determine the minimum value. Obviously, the workload of this method is extremely large. The main idea of Bayes optimization is to randomly select points of loss function f (x), and continuously reduce the confidence interval and improve the confidence level between these points to obtain a new loss function b (x) that is infinitely close to the real loss function f (x). At this time, the minimum value of b (x) is regarded as the minimum value of f (x). Bayes optimization idea solves the problem of finding the minimum value when the loss function is smooth and uniform and complex and nondifferentiable.
Before modeling, divide the data set, take the first 10,508 groups of data as the training set, and predict the closing price of the last 107 groups of data. In order to solve the problem of feature outliers and better fit the algorithm model for feature data, standardize the divided data set, that is, transform the original data into a mean value of 0, and the standard deviation is within the range of 0-1. For features with high correlation, the maximum likelihood estimation of principal component analysis (PCA) is used to reduce the feature dimension. Instantiate XGBoost Regressor, use Cross Validate to optimize the model, and KFold uses 50% cross validation to avoid disrupting the sample sequence, and use TPE (Tree Parzer Estimator) based Bayes optimization to optimize the model parameters, and finally output the optimized prediction results.

Evaluating Indicator
In this paper, the mean absolute error (MAE) and root mean square error (RMSE) are used to evaluate the prediction results, and the mean absolute percentage error (MAPE) is used to evaluate the stability of the prediction model.
Among them, the more novel the MAE and RMSE are, the better the prediction results are. The more novel the MAPE is, the stronger the interpretation ability of the characteristics to the target value is. The better the fitting degree of the model to the data is.

Experimental Results and Analysis
From August 5, 2013 to October 8, 2021, there are 10615 daily time series data of the opening price, the highest price, the lowest price, the average transaction price, the closing price, the trading volume and the transaction amount of each domestic carbon emission trading market. The rise and fall of some of the closing prices are shown in Figure 4. The initialized XGBoost model is adjusted by Bayes optimization based on TPE to make the model prediction results more accurate. ①Define the objective function, that is, instantiate the regression and train the model. Add a 50% cross validation that does not disrupt the order of samples to the objective function, and the output Scoring is the negative root mean square error. ② Define the parameter space. The super parameters in XGBoost algorithm are divided into three types: general parameters (for macro function control), Booster parameters (for controlling Booster details), and learning objective parameters (for determining learning strategies). This paper selects the learning that has the greatest impact on the algorithm performance_ rate、max_ depth、 min_ child_ weight、n_estimators and samples are used as the parameter space.③The optimization function is defined.The optimization agent model adopts TPE algorithm. When the loss stops for 100 consecutive times, the training is stopped. The loss is set to root mean square error (RMSE) and printed when the training is stopped. The results of optimization are shown in Table 3.
Learning_ rate, max_ depth、min_ child_ weight, n_ The final tuning results of estimators and subsamples are 0.05, 8, 1, 171 and 0.39 respectively. Both the stability and prediction effect of the Bayes optimized model are significantly improved compared with the traditional XGBoost model (Table 4) and the data fitting effect is better ( Figure 5).

Conclusion
a. The XGBoost model based on Bayes optimization proposed in this paper can effectively predict the carbon price, and is more accurate than the traditional XGBoost prediction method.
b. The regular terms incorporated in XBGoost can reduce the complexity of the model, effectively prevent over fitting, reduce the amount of calculation, and significantly improve the prediction efficiency of the model. Therefore, it also reflects the advantages of using XGBoost combined model for prediction.
To sum up, the Bayes-XGBoost model proposed in this paper provides a new idea in the field of carbon emission price prediction, which can provide some practical basis for carbon market participants to make investment decisions, so as to avoid carbon market risks caused by carbon price changes.