Investor Sentiment Index Based on Large Language Models and Its Predictive Analysis for the Shanghai Composite Index
DOI:
https://doi.org/10.54097/aqe8s684Keywords:
Large Language Models, BERT, Investor Sentiment Index, Shanghai Composite Index, Stock Forecasting.Abstract
In the realm of financial markets, investor mood often intensifies or lessens the influence of new information on price changes. Recently, progress in Large Language Models like the Bidirectional Encoder Representations from Transformers (BERT) algorithm has shown better predictive abilities than traditional methods in foreseeing market trends, so this study builds an investor sentiment index using the BERT model to test its predictive effectiveness regarding the Shanghai Composite Index, which involves gathering post data from the East Money Stock Forum, after that, the BERT model was fine-tuned for sentiment classification and scoring, thus creating a daily sentiment time series, and the predictive power of the constructed index was assessed using two approaches, conventional econometric regression analysis and a multi-scale forecasting framework combining Complementary Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Long Short-Term Memory(LSTM) network models. The empirical results proved that the BERT-based sentiment index could precisely grasp variations in investor sentiment, improving the explanatory and predictive ability for stock returns and the direction of market movement, demonstrating that Large Language Models are useful for sentiment analysis in financial text, offering a practical tool for investors and regulators to utilize sentiment-related insights.
Downloads
References
[1] Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294.
[2] Arcas, B. A. (2022). Do Large Language Models understand us? Daedalus, 151(2), 183–197.
[3] Boudoukh, J., Feldman, R., Kogan, S., & Richardson, M. (2013). Which news moves stock prices? A textual analysis (No. w18725). National Bureau of Economic Research.
[4] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
[5] Ferguson, N. J., Philip, D., Lam, H., & Guo, J. M. (2015). Media content and stock returns: The predictive power of press. Multinational Finance Journal, 19(1), 1–31.
[6] Guo, Z. (2024). A study on the impact of multimodal news sentiment on the stock market (Master's thesis, Southwestern University of Finance and Economics).
[7] Kirtac, K., & Germano, G. (2024). Sentiment trading with Large Language Models. Finance Research Letters, 62, 105227.
[8] Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35–65.
[9] Moreno, A., & Ordieres-Meré, J. (2025). Predicting stock price trends using language models to extract the sentiment from analyst reports: Evidence from IBEX 35-listed companies. Economics Letters, 112404.
[10] Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., ... & Mian, A. (2025). A comprehensive overview of Large Language Models. ACM Transactions on Intelligent Systems and Technology, 16(5), 1–72.
[11] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
[12] Zhang, C., Wu, X., Deng, H., & Zhang, H. (2022). A time-varying study of Chinese investor sentiment, stock market liquidity and volatility: Based on deep learning BERT model and TVP-VAR model.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

