Reliable Stock Prediction: Data, Models, Testing
DOI:
https://doi.org/10.54097/rk4ss359Keywords:
stock prediction; time series; multimodal; backtest overfitting; reproducibility; execution costs.Abstract
In recent years, deep learning and large language models have entered almost every discussion on stock price prediction. Many reported results look strong on paper, but often rely on clean data, cheap trading, and generous assumptions that rarely hold in real markets. This review looks at studies from 2020–2025 through three practical lenses. First, data and task design: how prices, order books, and news are collected, filtered, labeled, and aligned with the information actually available at decision time. Second, models and multimodal methods: long-horizon forecasters, order-book based models, and text–market fusion schemes, all compared against simple but competitive baselines instead of strawman benchmarks. Third, evaluation and implementation: temporal cross-validation, tests for backtest overfitting, explicit treatment of transaction costs and liquidity, and the engineering and compliance constraints of deployable systems. Taken together, these perspectives argue that credible stock prediction work depends less on one more novel architecture and more on transparent data pipelines, honest testing, and designs that could survive live trading.
Downloads
References
[1] Huang A H, Wang H, Yang Y. FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research, 2023, 40(2): 806-841. DOI: https://doi.org/10.1111/1911-3846.12832
[2] Wu H, Xu J, Wang J, Long M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Advances in Neural Information Processing Systems (NeurIPS), 2021.
[3] Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. International Conference on Machine Learning (ICML), 2022: 27268-27286.
[4] Zeng A, Chen M, Zhang L, Xu Q. Are Transformers Effective for Time Series Forecasting? AAAI Conference on Artificial Intelligence, 2023: 11121-11128. DOI: https://doi.org/10.1609/aaai.v37i9.26317
[5] Nie Y, Nguyen N H, Kalagnanam J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. 2023.
[6] Garza A, Challu C, Mergenthaler-Canseco M. TimeGPT-1. 2023.
[7] Ansari A F, Stella L, Türkmen C, Zhang X, Mercado P, Hassani H, V den Broeck G. Chronos: Learning the Language of Time Series. 2024.
[8] Zhang Z, Zohren S, Roberts S J. DeepLOB: Deep Convolutional Neural Networks for Limit Order Books. IEEE Trans. Signal Processing, 2019, 67(11): 3001-3012. DOI: https://doi.org/10.1109/TSP.2019.2907260
[9] Araci D. FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models. 2019.
[10] Yang H, Liu X, Zhou Z, et al. FinGPT: Open-Source Financial Large Language Models. 2023. DOI: https://doi.org/10.2139/ssrn.4489826
[11] Man X, Lin J, Yang Y. Stock-UniBERT: A News-Based Composable Stock Forecasting System using Deep Neural Networks. 2020 IEEE 18th International Conference on Industrial Informatics (INDIN), 2020: 440-445. DOI: https://doi.org/10.1109/INDIN45582.2020.9442147
[12] Gu W J, Zhong Y H, Li S Z, Wei C S, Dong L T, Sha Z M, Cheng X Q. Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis. Proc. ICCBDC, 2024.
[13] Fazlija B, Harder P. Using Financial News Sentiment for Stock Price Prediction. Mathematics, 2022, 10(13). DOI: https://doi.org/10.3390/math10132156
[14] Cristescu M P, Nerişanu R A, Dumitru A M. Using Market News Sentiment Analysis for Stock Market Prediction. Mathematics, 2022, 10(22). DOI: https://doi.org/10.3390/math10224255
[15] Bailey D H, Borwein J M, López de Prado M, Zhu Q J. The Probability of Backtest Overfitting. Journal of Computational Finance, 2016, 20(5): 39-69. DOI: https://doi.org/10.21314/JCF.2016.322
[16] Karpe M, Fang J, Ma Z, Wang C. Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Environment. ACM ICAIF, 2020. DOI: https://doi.org/10.1145/3383455.3422570
[17] Liu X-Y, Yang H, Gao J, Wang C D. FinRL: Deep Reinforcement Learning Framework for Quantitative Finance. ACM Int. Conf. on AI in Finance (ICAIF), 2021. DOI: https://doi.org/10.2139/ssrn.3955949
[18] Liu X-Y, Xia Z, Rui J, Yang H, Zhu M, Wang C D, Zhang Z, et al. FinRL-Meta: Market-Environments Library for Data-Driven Financial Reinforcement Learning. NeurIPS Datasets and Benchmarks Track, 2022. DOI: https://doi.org/10.2139/ssrn.4253139
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Frontiers in Computing and Intelligent Systems

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

