Reliable Stock Prediction: Data, Models, Testing

Authors

  • Guanglin Xie Guanghua Cambridge International School, Shanghai, China

DOI:

https://doi.org/10.54097/rk4ss359

Keywords:

stock prediction; time series; multimodal; backtest overfitting; reproducibility; execution costs.

Abstract

In recent years, deep learning and large language models have entered almost every discussion on stock price prediction. Many reported results look strong on paper, but often rely on clean data, cheap trading, and generous assumptions that rarely hold in real markets. This review looks at studies from 2020–2025 through three practical lenses. First, data and task design: how prices, order books, and news are collected, filtered, labeled, and aligned with the information actually available at decision time. Second, models and multimodal methods: long-horizon forecasters, order-book based models, and text–market fusion schemes, all compared against simple but competitive baselines instead of strawman benchmarks. Third, evaluation and implementation: temporal cross-validation, tests for backtest overfitting, explicit treatment of transaction costs and liquidity, and the engineering and compliance constraints of deployable systems. Taken together, these perspectives argue that credible stock prediction work depends less on one more novel architecture and more on transparent data pipelines, honest testing, and designs that could survive live trading.

Downloads

Download data is not yet available.

References

[1] Huang A H, Wang H, Yang Y. FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemporary Accounting Research, 2023, 40(2): 806-841. DOI: https://doi.org/10.1111/1911-3846.12832

[2] Wu H, Xu J, Wang J, Long M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Advances in Neural Information Processing Systems (NeurIPS), 2021.

[3] Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. International Conference on Machine Learning (ICML), 2022: 27268-27286.

[4] Zeng A, Chen M, Zhang L, Xu Q. Are Transformers Effective for Time Series Forecasting? AAAI Conference on Artificial Intelligence, 2023: 11121-11128. DOI: https://doi.org/10.1609/aaai.v37i9.26317

[5] Nie Y, Nguyen N H, Kalagnanam J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. 2023.

[6] Garza A, Challu C, Mergenthaler-Canseco M. TimeGPT-1. 2023.

[7] Ansari A F, Stella L, Türkmen C, Zhang X, Mercado P, Hassani H, V den Broeck G. Chronos: Learning the Language of Time Series. 2024.

[8] Zhang Z, Zohren S, Roberts S J. DeepLOB: Deep Convolutional Neural Networks for Limit Order Books. IEEE Trans. Signal Processing, 2019, 67(11): 3001-3012. DOI: https://doi.org/10.1109/TSP.2019.2907260

[9] Araci D. FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models. 2019.

[10] Yang H, Liu X, Zhou Z, et al. FinGPT: Open-Source Financial Large Language Models. 2023. DOI: https://doi.org/10.2139/ssrn.4489826

[11] Man X, Lin J, Yang Y. Stock-UniBERT: A News-Based Composable Stock Forecasting System using Deep Neural Networks. 2020 IEEE 18th International Conference on Industrial Informatics (INDIN), 2020: 440-445. DOI: https://doi.org/10.1109/INDIN45582.2020.9442147

[12] Gu W J, Zhong Y H, Li S Z, Wei C S, Dong L T, Sha Z M, Cheng X Q. Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment Analysis. Proc. ICCBDC, 2024.

[13] Fazlija B, Harder P. Using Financial News Sentiment for Stock Price Prediction. Mathematics, 2022, 10(13). DOI: https://doi.org/10.3390/math10132156

[14] Cristescu M P, Nerişanu R A, Dumitru A M. Using Market News Sentiment Analysis for Stock Market Prediction. Mathematics, 2022, 10(22). DOI: https://doi.org/10.3390/math10224255

[15] Bailey D H, Borwein J M, López de Prado M, Zhu Q J. The Probability of Backtest Overfitting. Journal of Computational Finance, 2016, 20(5): 39-69. DOI: https://doi.org/10.21314/JCF.2016.322

[16] Karpe M, Fang J, Ma Z, Wang C. Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Environment. ACM ICAIF, 2020. DOI: https://doi.org/10.1145/3383455.3422570

[17] Liu X-Y, Yang H, Gao J, Wang C D. FinRL: Deep Reinforcement Learning Framework for Quantitative Finance. ACM Int. Conf. on AI in Finance (ICAIF), 2021. DOI: https://doi.org/10.2139/ssrn.3955949

[18] Liu X-Y, Xia Z, Rui J, Yang H, Zhu M, Wang C D, Zhang Z, et al. FinRL-Meta: Market-Environments Library for Data-Driven Financial Reinforcement Learning. NeurIPS Datasets and Benchmarks Track, 2022. DOI: https://doi.org/10.2139/ssrn.4253139

Downloads

Published

27-03-2026

Issue

Section

Articles

How to Cite

Xie, G. (2026). Reliable Stock Prediction: Data, Models, Testing. Frontiers in Computing and Intelligent Systems, 16(1), 133-138. https://doi.org/10.54097/rk4ss359