Large Language Models for Financial Knowledge Extraction Analytical Insights and Corporate Planning Support

Xuguang Zhang; Mengdie Wang

doi:10.54097/7am6vk38

Authors

Xuguang Zhang
Mengdie Wang

DOI:

https://doi.org/10.54097/7am6vk38

Keywords:

Large Language Models; Financial Knowledge Extraction; Natural Language Processing; Transformer Architecture; Corporate Planning; Sentiment Analysis; Question Answering; Retrieval-Augmented Generation; Financial Analytics; Strategic Decision Support

Abstract

Large language models (LLMs) have emerged as transformative technologies in financial services, demonstrating unprecedented capabilities in extracting structured knowledge from unstructured financial documents, generating analytical insights, and supporting strategic corporate planning decisions. This review paper examines the comprehensive applications of LLMs including GPT-4, Claude, PaLM, and domain-specific financial models in automating knowledge extraction from diverse sources including earnings calls, financial reports, regulatory filings, and market commentary. We analyze how transformer-based architectures (TA) leverage attention mechanisms and contextual embeddings to understand complex financial terminology, temporal relationships, and causal connections in financial narratives. The paper explores advanced techniques including prompt engineering, few-shot learning, retrieval-augmented generation (RAG), and fine-tuning strategies that adapt general-purpose LLMs to specialized financial tasks. We examine applications in sentiment analysis of financial texts, automatic summarization of lengthy reports, entity recognition for companies and products, relationship extraction between financial events, and question-answering systems for financial queries. The review investigates how LLMs generate analytical insights through scenario analysis, trend identification, risk assessment, and competitive intelligence synthesis. We analyze corporate planning support applications including strategic initiative identification, market opportunity analysis, resource allocation recommendations, and investment thesis generation. Furthermore, we discuss integration architectures combining LLMs with structured databases, time-series models, and visualization tools to create comprehensive decision support systems. The paper addresses critical challenges including hallucination mitigation, accuracy verification, regulatory compliance, data privacy concerns, and the need for human oversight in high-stakes financial decisions. We examine evaluation methodologies for financial LLM applications, including domain-specific benchmarks, expert assessment protocols, and real-world performance metrics. Through synthesis of current research and deployed systems, we identify limitations including computational costs, update frequency challenges, bias in training data, and difficulties in explaining model reasoning. The review concludes by outlining promising research directions including multimodal financial analysis, real-time information integration, federated learning for privacy-preserving collaboration, and neuro-symbolic approaches combining neural language understanding with formal financial reasoning.

References

[1]Bakri, M., Andriani, D., & Bahri, S. (2024). Assessing Information Processing in Capital Market Structures. Bata Ilyas Educational Management Review, 4(2), 01-13.

[2]Gentzkow, M., Kelly, B., & Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3), 535-574.

[3]Joshi, R. (2025). Human-in-the-Loop AI in Financial Services: Data Engineering That Enables Judgment at Scale. Journal of Computer Science and Technology Studies, 7(7), 228-236.

[4]Zhang, H., Song, H., Li, S., Zhou, M., & Song, D. (2023). A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56(3), 1-37.

[5]Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL. 2019;2019:4171-4186.

[6]OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774; 2023.

[7]Li, T., Fan, L., Yuan, Y., He, H., Tian, Y., Feris, R., ... & Katabi, D. (2023). Addressing feature suppression in unsupervised visual representations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1411-1420).

[8]Christopoulou F, Miwa M, Ananiadou S. Connecting the dots: Document-level neural relation extraction with edge-oriented graphs. Proceedings of EMNLP. 2019;2019:4925-4936.

[9]Xiang W, Wang B. A survey of event extraction from text. IEEE Access. 2019;7:173111-173137.

[10]Zhang W, Li X, Deng Y, et al. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Trans Knowl Data Eng. 2022;35(11):11019-11038.

[11]Calomiris CW, Mamaysky H. How news and its context drive risk and returns around the world. J Financ Econ. 2019;133(2):299-336.

[12]Irshad, M. (2024). Exploring LLMS, A Systematic Review with SWOT Analysis. J Artif Intell Mach Learn & Data Sci 2024, 2(4), 1749-1766.

[13]Blue, G., Faraji, O., Khotanlou, M., & Rezaee, Z. (2024). A corporate risk assessment and reporting model in emerging economies. Journal of Applied Accounting Research, 25(4), 783-811.

[14]Finkenstadt, D. J., Sotiriadis, J., Guinto, P., & Eapen, T. (2024). Contingency scenario planning using generative AI. Finkenstadt, DJ, Sotiriadis, J., Guinto, P., & Eapen, TT (January 22, 2024). Contingency Scenario Planning using Generative AI. California Management Review Insights. https://cmr. berkeley. edu/2024/01/continge

[15]Olaleye, I., Mokogwu, V., Olufemi-Phillips, A. Q., & Adewale, T. T. (2024). Unlocking competitive advantage in emerging markets through advanced business analytics frameworks. GSC Advanced Research and Reviews, 21(02), 419-426.ncy-scenario-planning-using-generative-ai.

[16]Hessami, A. R., Faghihi, V., Kim, A., & Ford, D. N. (2020). Evaluating planning strategies for prioritizing projects in sustainability improvement programs. Construction Management and Economics, 38(8), 726-738.

[17]Greenwald, B. C., Kahn, J., Bellissimo, E., Cooper, M. A., & Santos, T. (2020). Value investing: from Graham to Buffett and beyond. John Wiley & Sons.

[18]Suwarno, S., Fitria, F., & Azhar, R. (2023). Optimizing budget allocation: a strategic framework for aligning human resource investments with financial objectives and business goals. Atestasi: Jurnal Ilmiah Akuntansi, 6(2), 835-855.

[19]Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation. ACM Comput Surv. 2023;55(12):1-38.

[20]Esna-Ashari, M. (2025). Beyond the Black Box: A Review of Quantitative Metrics for Neural Network Interpretability and Their Practical Implications. International journal of sustainable applied science and engineering, 2(1), 1-24.

[21]Kairouz P, McMahan HB, Avent B, et al. Advances and open problems in federated learning. Found Trends Mach Learn. 2021;14(1-2):1-210.

[22]Lazaridou A, Kuncoro A, Gribovskaya E, et al. Mind the gap: Assessing temporal generalization in neural language models. Adv Neural Inf Process Syst. 2021;34:29348-29361.

[23]OLAWORE, S. O., OKOLI, C., ABIMBOLA, O., SERIFAT, B. U. U. U. D., OFURUM, A., & LEO, O. (2025). AI-Driven Cybersecurity Governance in Financial Services: Enhancing Ethical Auditing, Automated Compliance Monitoring and Explainable AI for Stakeholder Trust.

[24]Araci D. FinBERT: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063; 2019.

[25]Jadhav, A., & Mirza, V. (2025). Large Language Models in Equity Markets: Applications, Techniques, and Insights. Techniques, and Insights (March 15, 2025).

[26]Duane, J., Morgan, A., & Carter, E. (2025). A Review of Financial Data Analysis Techniques for Unstructured Data in the Deep Learning Era: Methods, Challenges, and Applications. OSF Preprints, (gdvbj_v1).

[27]Yang H, Chen Y, Liu K, et al. FINRED: A dataset for relation extraction in financial domain. arXiv preprint arXiv:2010.03058; 2020.

[28]Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877-1901.

[29]Huang L, Cao S, Parulian N, et al. Efficient attentions for long document summarization. Proceedings of NAACL. 2021;2021:1419-1436.

[30]Guo, Y., Sohn, J. H., Leroy, G., & Cohen, T. (2025). Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation. arXiv preprint arXiv:2505.10409.

[31]Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst. 2020;33:9459-9474.

[32]Gao J, Galley M, Li L. Neural approaches to conversational AI. Found Trends Inf Retr. 2019;13(2-3):127-298.

[33]Chen J, Yang D, Taneja J. FinQA: A dataset of numerical reasoning over financial data. Proceedings of EMNLP. 2021;2021:3697-3711.

[34]Huang AH, Wang H, Yang Y. FinBERT: A large language model for extracting information from financial text. Contemp Account Res. 2023;40(2):806-841.

[35]Ning, J., Tao, L., Mi, B., & Zhang, L. (2025). The use of managerial tone in corporate disclosure: a literature review in accounting and finance. Journal of Accounting Literature, 1-88.

[36]Nazir, A., Rao, Y., Wu, L., & Sun, L. (2020). Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Transactions on Affective Computing, 13(2), 845-863.

[37]Rodríguez-Muñoz-de-Baena, I., Coronado-Vaca, M., & Vaquero-Lafuente, E. (2025). Fine-tuning transformer models for M&A target prediction in the US ENERGY sector. Cogent Business & Management, 12(1), 2487219.

[38]Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of EMNLP. 2019;2019:3982-3992.

[39]Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108; 2019.

[40]Brath, R., Bradley, A., & Jonker, D. (2024). Strategic management analysis: from data to strategy diagram by LLM. arXiv preprint arXiv:2409.06643.

[41]Yuan, X., Zhang, X., Xu, K., Xu, Y., Yu, L., Wang, J., ... & Wang, H. (2025). Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making. arXiv preprint arXiv:2506.12012.

[42]Kejriwal, M., Santos, H., Shen, K., Mulvehill, A. M., & McGuinness, D. L. (2024). A noise audit of human-labeled benchmarks for machine commonsense reasoning. Scientific Reports, 14(1), 8609.

[43]Keith K, Stent A. Modeling financial analysts' decision making via the pragmatics and semantics of earnings calls. Proceedings of ACL. 2019;2019:493-503.

[44]Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., ... & Le, Q. V. (2021). Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.

[45]Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824-24837.

[46]Wu S, Irsoy O, Lu S, et al. BloombergGPT: A large language model for finance. arXiv preprint arXiv:2303.17564; 2023.

[47]Hu EJ, Shen Y, Wallis P, et al. LoRA: Low-rank adaptation of large language models. ICLR; 2022.

[48]Liu X, He P, Chen W, Gao J. Multi-task deep neural networks for natural language understanding. Proceedings of ACL. 2019;2019:4487-4496.

[49]Gao L, Ma X, Lin J, Callan J. Precise zero-shot dense retrieval without relevance labels. Proceedings of ACL. 2023;2023:1762-1777.

[50]Du, Z., & Yu, S. (2023). Preliminary Knowledge. In Social Network Large-Scale Decision-Making: Developing Decision Support Methods at Scale and Social Networks (pp. 7-20). Singapore: Springer Nature Singapore.

[51]Menick J, Trebacz M, Mikulik V, et al. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147; 2022.

[52]Lam, W. Y., Cheung, K. Y., Lau, T. H., Leung, M. K., & Chan, H. L. (2025). A DOMAIN-ADAPTIVE QUESTION ANSWERING FRAMEWORK FOR FINANCIAL TEXTS WITH MULTI-TASK SEMANTIC REASONING. Computers and Education letters, 2(1), 37-45.

[53]Lopez-Lira A, Tang Y. Can ChatGPT forecast stock price movements. arXiv preprint arXiv:2304.07619; 2023.

[54]Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.

[55]Ribeiro MT, Wu T, Guestrin C, Singh S. Beyond accuracy: Behavioral testing of NLP models. Proceedings of ACL. 2020;2020:4902-4912.

[56]Kaplan J, McCandlish S, Henighan T, et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361; 2020.

[57]Blodgett SL, Barocas S, Daumé III H, Wallach H. Language (technology) is power: A critical survey of bias in NLP. Proceedings of ACL. 2020;2020:5454-5476.

[58]Gong, H., Tang, Y., Pino, J., & Li, X. (2021). Pay better attention to attention: Head selection in multilingual and multi-domain sequence modeling. Advances in Neural Information Processing Systems, 34, 2668-2681.

[59]Thawani A, Pujara J, Ilievski F, Szekely P. Representing numbers in NLP: a survey and a vision. Proceedings of NAACL. 2021;2021:644-656.

[60]Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.