Faster and More Robust GA-Lion and MM-MLP for High-Frequency Trading
DOI:
https://doi.org/10.54097/1ec1p845Keywords:
Deep Learning (DL), Gated Recurrent Unit (GRU), Evolved Sign Momentum (Lion), Simulated Annealing (SA), Multilayer Perceptron (MLP)Abstract
The key challenges in applying deep neural networks (DNNs) for stock high-frequency trading prediction lie in accelerating the training of DNNs as well as improving the robustness of deep learning (DL) models. High-frequency trading is a race-to-the-bottom task, which places high demands on the efficiency of model training. To accelerate the training of DNNs, we propose an improved Evolved Sign Momentum (Lion) algorithm, namely the Gauss Simulated Annealing Lion (GA-Lion), to significantly improve the training speed so that the model can adapt to the market dynamics more quickly and meet the real-time demand of high-frequency trading. Meanwhile, in order to enhance the robustness of DNN, we design a novel feature extraction module based on multilayer perceptron (MLP) called Masked Moving-Average MLP (MM-MLP), which is able to effectively capture the key features in the price series and reduce the noise interference through the moving-average mechanism, thus improving the stability and generalization ability of the model in the complex market environment. This improves the stability and generalization ability of the model in complex market environments. Combined with the gated recurrent unit (GRU) and optimized by GA-Lion and MM-MLP, our model achieves faster loss convergence on high-frequency trading data, while demonstrating greater robustness in highly volatile market environments.
Downloads
References
[1] Securities U S, Exchange Commission. Part III: Concept release on equity market structure; Proposed Rule, 17 CFR Part 242[J]. Federal Register, 2010, 75(13): 3594-3614.
[2] Leal S J, Napoletano M. Market stability vs. market resilience: Regulatory policies experiments in an agent-based model with low-and high-frequency trading[J]. Journal of Economic Behavior & Organization, 2019, 157: 15-41.
[3] Arévalo A, Niño J, Hernández G, et al. High-frequency trading strategy based on deep neural networks[C]//International conference on intelligent computing. Cham: Springer International Publishing, 2016: 424-436.
[4] Thompson N C, Greenewald K, Lee K, et al. The computational limits of deep learning[J]. arxiv preprint arxiv:2007.05558, 2020, 10.
[5] Chen C, Zhang P, Zhang H, et al. Deep learning on computational‐resource‐limited platforms: A survey[J]. Mobile Information Systems, 2020, 2020(1): 8454327.
[6] Yang T, Zhou F, Du M, et al. Fluctuation in the global oil market, stock market volatility, and economic policy uncertainty: A study of the US and China[J]. The quarterly review of economics and finance, 2023, 87: 377-387.
[7] Bustos O, Pomares-Quimbaya A. Stock market movement forecast: A systematic review[J]. Expert Systems with Applications, 2020, 156: 113464.
[8] Zhang D, Lou S. The application research of neural network and BP algorithm in stock price pattern classification and prediction[J]. Future Generation Computer Systems, 2021, 115: 872-879.
[9] Kasman A. The impact of sudden changes on the persistence of volatility: Evidence from the BRIC countries[J]. Applied Economics Letters, 2009, 16(7): 759-764.
[10] Chun S, Oh S J, Yun S, et al. An empirical evaluation on robustness and uncertainty of regularization methods[J]. arxiv preprint arxiv:2003.03879, 2020.
[11] Ying X. An overview of overfitting and its solutions [C]//Journal of physics: Conference series. IOP Publishing, 2019, 1168: 022022.
[12] Xie Z, He F, Fu S, et al. Artificial neural variability for deep learning: On overfitting, noise memorization, and catastrophic forgetting[J]. Neural computation, 2021, 33(8): 2163-2192.
[13] Brogaard J, Nguyen T H, Putnins T J, et al. What moves stock prices? The roles of news, noise, and information[J]. The Review of Financial Studies, 2022, 35(9): 4341-4386.
[14] Verma R, Verma P. Noise trading and stock market volatility[J]. Journal of Multinational Financial Management, 2007, 17(3): 231-243.
[15] Orlitzky M. Corporate social responsibility, noise, and stock market volatility[J]. Academy of Management Perspectives, 2013, 27(3): 238-254.
[16] Sonkavde G, Dharrao D S, Bongale A M, et al. Forecasting stock market prices using machine learning and deep learning models: A systematic review, performance analysis and discussion of implications[J]. International Journal of Financial Studies, 2023, 11(3): 94.
[17] Karystinos G N, Pados D A. On overfitting, generalization, and randomly expanded training sets[J]. IEEE Transactions on Neural Networks, 2000, 11(5): 1050-1057.
[18] Xu H, Mannor S. Robustness and generalization[J]. Machine learning, 2012, 86: 391-423.
[19] Karolina Dziugaite G, Drouin A, Neal B, et al. In Search of Robust Measures of Generalization[J]. arxiv e-prints, 2020: arxiv: 2010.11924.
[20] Shen L, Sun Y, Yu Z, et al. On efficient training of large-scale deep learning models: A literature review[J]. arxiv preprint arxiv:2304.03589, 2023.
[21] Izmailov P, Podoprikhin D, Garipov T, et al. Averaging weights leads to wider optima and better generalization[J]. arxiv preprint arxiv:1803.05407, 2018.
[22] Shrestha A, Mahmood A. Review of deep learning algorithms and architectures[J]. IEEE access, 2019, 7: 53040-53065.
[23] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of machine learning research, 2011, 12(7).
[24] Tieleman T. Lecture 6.5‐rmsprop: Divide the gradient by a running average of its recent magnitude[J]. COURSERA: Neural networks for machine learning, 2012, 4(2): 26.
[25] Kingma D P, Ba J. Adam: A method for stochastic optimization [J]. arxiv preprint arxiv:1412.6980, 2014.
[26] Loshchilov I, Hutter F. Decoupled weight decay regularization [J]. arxiv preprint arxiv:1711.05101, 2017.
[27] Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, and Quoc V. Le. 2023. Symbolic discovery of optimization algorithms. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 2140, 49205–49233.
[28] Chen L, Liu B, Liang K, et al. Lion secretly solves constrained optimization: As lyapunov predicts[J]. arxiv preprint arxiv: 2310.05898, 2023.
[29] Zhang C, Shao Y, Sun H, et al. The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms [J]. Math. Biosci. Eng, 2024, 21: 1270-1285.
[30] Wang J, Zhang S. An improved deep learning approach based on exponential moving average algorithm for atrial fibrillation signals identification[J]. Neurocomputing, 2022, 513: 127-136.
[31] Klinker F. Exponential moving average versus moving exponential average[J]. Mathematische Semesterberichte, 2011, 58: 97-107.
[32] Gong, Yunpeng et al. Beyond Dropout: Robust Convolutional Neural Networks Based on Local Feature Masking. 2024 International Joint Conference on Neural Networks (IJCNN) (2024): 1-8.
[33] Yadav S, Singh A, Singh S K, et al. Improving Market Efficiency and Profitability in High-Frequency Trading Using Neural Network-Based Deep Learning Techniques[C]//2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024: 1-6.
[34] Kawaguchi K, Deng Z, Luh K, et al. Robustness implies generalization via data-dependent generalization bounds[C]// International conference on machine learning. PMLR, 2022: 10866-10894.
[35] Bertsimas D, Tsitsiklis J. Simulated annealing[J]. Statistical science, 1993, 8(1): 10-15.
[36] Crawshaw M, Liu M, Orabona F, et al. Robustness to unbounded smoothness of generalized signsgd[J]. Advances in neural information processing systems, 2022, 35: 9955-9968.
[37] Chen X, Hsieh C J, Gong B. When vision transformers outperform resnets without pre-training or strong data augmentations[J]. arXiv preprint arXiv:2106.01548, 2021.
[38] Foret P, Kleiner A, Mobahi H, et al. Sharpness-aware minimization for efficiently improving generalization[J]. arXiv preprint arXiv:2010.01412, 2020.
[39] Neelakantan A, Vilnis L, Le Q V, et al. Adding gradient noise improves learning for very deep networks[J]. arXiv preprint arXiv:1511.06807, 2015.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Frontiers in Computing and Intelligent Systems

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

