Research on Stock prediction based on time series and machine learning algorithm

: In the trading market, since the birth of Bitcoin, people have been disputing gold and bitcoin. How to seek an appropriate and optimal investment portfolio has become a problem that people pay close attention to. In this case, we need to build a model and give an optimal trading strategy to maximize the profit of their portfolio. We used time series analysis to predict the price and AHP to quantify the investment risk, so as to give the best investment strategy and prove its rationality. Firstly, in order to get the optimal investment strategy, we had to predict future price movements of investment products, determined whether to sell, held or bought investment products, we adopted the method of time series analysis, with the aid of XGBoost machine learning algorithms, using some existing data as a training set, using the day before the data as a test set, to forecast movements in the price of the second day of investment products, to use the ideas of hedging. Through the analysis of gold, Bitcoin investment risk, we came up with the optimal ratio between cash, Bitcoin, and gold. Through the study of the classification of all kinds of possible situations, with the help of the has to predict prices, the best investment strategy was given. With the help of the optimal strategy given in this paper, the initial investment of $1000 was put into the debugged model, and the final total investment reached about $250,000.Secondly, in order to justify our model, in this paper established on the basis of the optimal strategy and design of three other different schemes, analyzed four different schemes, we used the analytic hierarchy process (AHP) 1-9 scale of four different schemes of risk quantitative analysis, it was concluded that the risk of four plans relative size, The final returns of the three other schemes were calculated by using the dynamic programming model, and the rationality of the proposed strategy was proved by the combination of risk and return. Thirdly, in order to determine the sensitivity of the trading strategy of transaction cost, by changing the transaction cost rate of gold and Bitcoin, we observed the change of the total asset amount, and concluded that there is a non-linear relationship between the transaction cost rate of gold or Bitcoin and the total asset amount, and the transaction cost of gold or Bitcoin affects the total asset amount by affecting the frequency of total asset allocation conversion. Finally, we conducted sensitivity analysis and model improvement, and analyzed the triality of the strategy presented in this paper in real life, and wrote a two-page memo to convey our strategy, model and results to traders.


Problem Background
COINS as digital virtual currency, which has high yield, large volatility, the nature of the free of regulation and tax free, COINS sometimes known as the new gold, can be replaced gold as a new type of safe havens, used to hedge against inflation and currency is referred to as the new gold, represents the COINS and gold to replace each other, That is, the risks they present in market transactions should be opposite. According to data query, it can also be concluded that in most cases, the opposite risks are true. As assets of the deal, the price of gold and currency trends are uncertain, so when traders to make maximum benefit in the market for gold, the currency and cash transactions, will pass the previous price of gold and currency trends to determine how to adjust the storehouse, compared with the traditional qualitative investment depends on experience and feeling, By establishing the model of quantitative investment can comprehensive multi-dimensional information, find out the influence factors of as much as possible, a more comprehensive analysis of the trend information reflected by the price, it is concluded that the optimal strategy, and the optimal strategy should not only is the highest returns, and ought to take into account the benefits and risks at the same time, through the establishment of model to find out a risk as low as possible, the interests of the strategy of as high as possible. Quantitative qualitative thought in the development of the market is getting better and better, namely through the establishment of model to provide the buyer with the optimal strategy, based on the above, asked us to bear on the model best warehouse strategy, is also based on the above ideas, strategy needs to satisfy the investors in the case of risk as small as possible makes the benefit maximization of demand.

Restatement of the Problem
Problem one: By establishing the model, make predictions for tomorrow based on the day's and previous data, the predict that the price of gold and currency, get five years predict prices, and then gives the optimal strategy of every trading day, for $1000 in initial time initial capital on the basis of using our and strategies of the proposed model, calculated as of October 9, 2021 in assets.
Problem two: Provide evidence for our best trading strategy. Problem three: Calculate the sensitivity of our resulting trading strategy to transaction costs and how those costs affect our strategy and results.

Our work
This topic aims to require us to provide the best investment strategy, and the best is not only reflected in the high benefits obtained, but also in the small risk, the premise of providing the best investment strategy is to predict the price of the next day. For the first question, we use XGBoost model to predict. After obtaining the predicted price of gold and Bitcoin, we will first total investment is divided into three parts, in the form of Bitcoin, gold and cash, investment strategy will borrow the ideas of the hedging, the purpose is to blend in risk investment strategy, to predict the possible situation and classification, then in estimate the price under the premise of the next day, will invest based on the predicted price of gold and Bitcoin. That is to get the best investment strategy, and then into the initial capital, to find the final total investment; Question two asks us to provide evidence for optimal strategy, once again we provide three strategies, and calculate the three strategies respectively the final total amount of investment, at the same time, by analytic hierarchy process (AHP) to a total of four kinds of strategies for the calculation of risk weights, finally will risk is with their own benefits after multiplication, four strategies for score and ranking, By comparison, it is proved that the strategy provided in the first question is the best strategy under the effect of both benefits and risks. The third small question requires us to conduct sensitivity analysis of transaction cost rate, change gold transaction cost rate and bitcoin transaction cost rate respectively, and get their influence on the final total investment, and then show the results with images. 1). Assuming that cash does not exist in any form of investment, its price does not change.

Assumptions and Justifications
The dollar is our ultimate yardstick, so it cannot be changed.
2). Assume that investor behavior conforms to the theory of economic man hypothesis.
Exclude factors other than non-investment benefits and risks.
3). Assume that price forecasts are only influenced by the day before.
Improve the practicality of the model 4). Assume that only one asset is retained among the three policies specified in Question 2.
Ignore the impact of asset allocation

Notations
The transaction cost ratio of gold, the transaction cost rate of Bitcoin -X The basis for judging whether it is the opening day of gold is X to take 1 or 0, when gold is the opening day, X takes 1; when gold is closed, X takes 0.
-The final amount of bitcoin assets The currency The final amount of gold assets Troy oz.

Y C
The final amount of cash assets dollar 4. XGBoost model

Model establishment XGBoost model
XGBoost sorts all features according to the value of features. When traversing segmentation points, XGBoost uses the cost of data to find the best segmentation point on a feature. When finding a feature segmentation point, XGBoost splits the data into left and right sub-nodes. Make the predicted value of the tree group as close to the real value as possible and have the maximum generalization ability. Generally speaking, the accuracy of the model depends on the optimization of the objective function, because the better the optimization of the objective function, the closer the predicted value is to the real value, the better the generalization ability of the model is. The above two objectives can be achieved at the same time by minimizing the loss function and increasing the penalty term of model complexity. We optimize the objective function to achieve the comprehensive optimization of error and complexity, so our objective function is: obj(θ) = L(θ) + Ο(θ) We can choose CART as the basis function of the model, so the MTH prediction result of a single CART is: f m (x) = T(x; θ m ) Where T represents the decision tree, m represents the number of base classifiers, and the final prediction result is the previous prediction result plus the current decision tree, so the error term can be expressed as: ( ,̂) = ( , −1 ( )) + ( , ) Among them L(y, ŷ) is the sum of the difference between the actual value and the predicted valu We define the structural complexity function by using the number of leaf nodes T and the square of L2 module of LeafScore: Where γ represents the coefficient of the complexity of the control technique, and λ represents how large the proportion is to change the regular term to prevent the model from overfitting.
By integrating the deviation function and variance function, we can give the following objective function: We use the gradient descent algorithm to calculate the error of the objective function repeatedly to make the error smaller and smaller, and the result of the model is necessarily optimal.
The formula of gradient descent is as follows: Taylor's second-order expansion of the objective function gives the final form of the objective function:

Model Solution
For data processing in the first place, eliminate invalid data, because the topic request by the end of the day we use the data for the next day the price of gold and currency, so choose n days before COINS and gold price data as independent variables, n days of COINS and gold price data as the dependent variable, operation XGBoost prediction, Genetic algorithm is used to optimize the learning rate. Due to the limitation of data quantity, we use the first 10% data for training, that is, traders do not trade during the training days, and the calculation results are shown in the figure below (only part of data is shown here): It can be seen from the value of the calculation coefficient R² that R² is within the determination range, so the calculation result is reasonable. In the figure above, the blue line is the real value of bitcoin price, and the green line is the predicted value of Bitcoin price.

Solution of problem 1
The predicted price of the n+1 day obtained from the above solving process will then affect the amount of cash, gold and bitcoin assets of traders in different time periods to obtain the best strategy. They ask us to use the best strategy to find the money as of October 9, 2021 with an initial investment of $1,000.
Based on the five years in the form of a trader's position change, we first according to the former traders of trading positions form three classes of cash, gold, COINS, based on this, again through the predict the next day the price of gold and currency positions, specific rules for through comparing the size of the profit and loss for the transfer of assets.
Assuming that the initial investment value is 0 the investment value on the ith day is the initial cash holding is 0 , the cash holding on the ith day is , the initial gold asset amount is 0 , the gold asset amount on the ith day is , the initial bitcoin asset amount is 0 , and the bitcoin asset amount on the ith day is , the following relationship can be obtained: On I day, the growth rate of gold is β, the growth rate of bitcoin is R, the transaction cost rate of gold is 1 , and the transaction cost rate of bitcoin is 2 . So let's assume that X is the opening day for gold, X is 1 or 0, and when gold is the opening day, X is 1; When gold is closed, X is 0 In order to gain profit, before each trade, the loss or profit should be calculated according to the predicted price of gold and bitcoin to determine how to move the position. The specific rules are shown in the figure below:

Fig. 4 Optimal strategy diagram
Since the influence of risks should be considered, we adopt the method of hedging for investment based on risk factors. It can be found from literature review [1] that bitcoin's return rate and gold's return rate have a fractional cointegration relationship at 0.01-0.99 points, and the risk change trend of gold market and Bitcoin market is roughly opposite. This provides a basis for the combination of gold and bitcoin to carry out hedging. Hedging is a behavior in which futures trading temporarily replaces physical trading in order to avoid or reduce the loss of adverse price changes. Therefore, we will adopt the idea of hedging to carry out position adjustment, and the specific rules are as follows: To remove 10% of the training set number of days, before the deal, will first $1000 fund as all cash holdings, gold and currency in the initial state is zero, in accordance with the above judgment rules to determine, after integrating the concept of hedging to make the storehouse decision-making, namely in the use of all the cash to buy COINS, again through the way of selling gold for cash value, Obtain a new position portfolio, and the specific results are as follows:   As can be seen from the above chart, bitcoin and gold present opposite risk situations in most cases. The total assets are relatively stable in the early stage, and increase greatly in the later stage.

Model establishment and preparation
First of all, we develop three strategies on the basis of the optimal strategy. Strategy one assumes that only cash and bitcoin transactions occur; Strategy two assumes that only cash and gold transactions occur; Strategy three assumes that there is no transaction for $1000, i.e. short positions. Score the four strategies based on comprehensive consideration of risks and benefits, and prove the optimal strategy by comparative comparison.
Strategy 3 is short position, that is, we assume that the total investment in five years is still $1,000. For strategy 1 and Strategy 2, we need to use the idea of dynamic programming model to solve the final total investment. We divide the fiveyear transaction time into N periods, assuming is the cash purchase of Bitcoin at the end of the KTH period, +1 is the minimum price of Bitcoin in the KTH +1 period, and +1 is the maximum price of bitcoin in the KTH +1 period. The following formula can be obtained: The initial investment amount of bitcoin 0 is known, so the final asset amount of bitcoin canbe obtained by recursion. Similarly, the final asset amount of gold canbe obtained. Then the expression of the final total investment Y is as follows: = + + After the total investment of the four strategies is obtained, the risks of the four strategies are obtained through the analytic hierarchy process, and the final scoring result is obtained by multiplying the risks after the risk is positively transformed.
Analytic hierarchy process (AHP) is mainly used for quantitative analysis of qualitative decision making problems, the basic idea is determined according to the need to solve the problem of evaluation index, by constructing judgment matrix is given a weight, every index as the criteria for solving problems, the second consistency inspection, consistency index, compared with the size of 0.1, if the consistency ratio is less than 0.1, Then it is proved that the matrix meets the consistency requirements, and finally the weight of the matrix through consistency is calculated.
Establish the analytic hierarchy process model for optimal strategy: RI is the random consistency ratio of the corresponding exponential scale.

Model Solution
Each element with a downward membership relationship is called the first element of the judgment matrix, and the elements belonging to it are sequentially arranged in the first row and the first column to establish an exponential scale judgment matrix about strategic risks.
The judgment matrix has the following properties: Based on the eigenvalue method to find the weight of the matrix, we use MATLAB software to solve the weight, that is, to find the maximum eigenvalue and the corresponding eigenvector.
Secondly, the eigenvalue and eigenvector are calculated by the formula AW=λW. The eigenvector obtained is the weight of each part solved.
The results of CI =0.056818, CR =0.063131, Pass the consistency test and meet the consistency requirements.
Finally, the evaluation table of the four strategies is obtained, as shown in the following table 6:

Sensitivity Analysis
The question asks us to calculate the sensitivity of the resulting trading strategy to the transaction cost, and how the transaction cost affects our strategy and results. By changing the α, namely trade gold or COINS produced by transaction costs to total assets ratio, considering the gold COINS and turnover rate is different, so we use the control variable method, namely again run data only consider a kind of trade goods change rate of the transaction cost, it is concluded that the final with 1 and 2 as independent variables, the threedimensional image of total assets as the dependent variable, Specific results are shown in the figure below: Fig. 9 Change of transaction cost -total investment As can be seen from the figure, when the transaction cost rate of gold does not change, the final total investment decreases with the increase of the transaction cost rate of bitcoin. When the transaction cost rate of bitcoin does not change, the final total investment decreases as the transaction cost rate of gold increases. Therefore, it is concluded that with the increase of transaction cost rate, the final total investment presents a downward trend.

Model Evaluation and Further Discussion
First of all, XGBoost regression algorithm is used to train 10% of the data set to predict the future price of gold and bitcoin. Meanwhile, the prediction results are combined with the optimal strategy for asset allocation. At the same time, different asset allocation will have certain risks and returns. Pursuing higher returns with smaller risks is the target solution of the model. Based on the analytic hierarchy process, we enumerate several trading schemes for comparison and show the rationality of the scheme. Because it is difficult to achieve the optimal asset trading scheme that quantifies risks and combines them with the actual situation, the degree of quantification of risks is not clear. The optimal scheme is determined by artificially judging risks and calculating the size of total assets.
In actual cases, the impact of gold and currency exchange is not only the price of gold and currency before its risks and our personal for asset transfer, preferences, and the positions of others' comments, and so on, the model also requires a combination of more practical influence factors to make the prediction result is close to in fact. At the same time, in addition to the factors affecting our trading behavior, the model also needs to constantly compare the actual situation with the predicted results, excluding the actual data caused by some major events, so as to improve the accuracy of the model.
The objective of this model is to find the optimal strategy, that is, to meet the needs of investors to maximize their interests with as little risk as possible. In reality, the public's price prediction is subjective, and the transaction behavior in the face of price changes is not enough to achieve better optimization. Therefore, the use of this model will guide the investment situation of users and provide better ideas.

Conclusion
Portfolio problem proposed in this paper, using the price so far to judge whether to buy, hold or sell assets every day, we first consider using the time series method to predict the future price trend of gold and Bitcoin, according to the result of prediction, planning strategies, we designed a kind of investment strategy in the asset allocation in the process of maximizing the interests. With the investment strategy combined with the asset allocation in the investment process, the maximum profit of RMB 250,000 was obtained for the initial 1000 yuan of investment under the model of our team.
In the model validation process on our team, we select the profits and risks as a standard of evaluation model, using the analytic hierarchy process (AHP) 1-9 criteria for four kinds of different investment strategies in risk assessment, the risk situation of different combined with different schemes of profits, to verify the investment model of optimality.
To test and verify the sensitivity of model of investment cost, by changing the size of the investment cost, we observe the change trend of profits, it is concluded that investment cost by changing the model in the process of trading turnover rate which affect the change of the profit, when the cost of investment, the largest model in order to realize the profit maximization, change will reduce the number of asset allocation. By changing the transaction costs of gold and Bitcoin we get a graph of how profits change at different costs.

Memo
Dear Sir/Madam We are the team you hire to design your trading strategy, we wrote this memo to you to better explain the results our team predicted, as well as some suggestions.
First of all, this trading strategy requires that only cash, gold and Bitcoin be exchanged, our team with the help of time series analysis method, the currency and predict the price of gold moves, and use machine learning algorithms, select are arranged in the front part of the data as a training set, repeated training, and choose some algorithm to adjust the precision of the training, To make sure that the predicted data is as close to the real value as possible or even the same as the real value. The predicted price provides the basis for making trading strategies. Our team calculates the predicted rise and fall according to the predicted price, and classifies the possible situations based on this, so as to ensure that no matter what kind of situation you encounter, you can give the appropriate strategy. Here, our team will show you the optimal strategy we designed in the form of mind map: The mind map shows the basis for judging various situations and how decisions should be made in such situations. We also tested our strategy with an initial capital of $1,000. After five years of trading, we ended up with a total of nearly $250,000.
In order to prove the rationality of our strategy and relative to optimality, our team designed the three strategies at the same time, comprehensive considerations, the risks and benefits each strategy is concluded what we design strategy both benefits and risks in the four strategies, become a higheryielding less risky strategy at the same time, so for our team designed strategy you don't need to worry about its rationality. You may be concerned that the transaction cost rate of gold and Bitcoin may have an impact on the final total return, our team for this problem for two kinds of transaction cost rate respectively to test its influence on the final total amount of investment, finally it is concluded that gold transaction cost rate does not change, the final total amount of investment as the currency trading cost rate increase and decrease; When the transaction cost rate of Bitcoin does not change, the final total investment decreases as the transaction cost rate of gold increases. Therefore, as the transaction cost rate increases, the final total investment will decrease. Below will show you a graph of gold transaction cost rate, Bitcoin transaction cost rate and the final investment amount, so that you can see the impact more directly.
In the same way, our advice is only to provide you with a reference. We hope you can make the best decision based on its own actual conditions.