Large Language Models for Mathematical Problem Solving: Applications, Challenges and Future Directions

Authors

  • Yuxuan Wan

DOI:

https://doi.org/10.54097/0pbd1t53

Keywords:

Large Language Model, Mathematical Problem Solving, deep learning.

Abstract

Large language models (LLMs) have made remarkable strides in natural language tasks, but mathematical problem solving remains a significant challenge. This paper surveys recent methods developed to enhance the mathematical reasoning abilities of LLMs. Techniques such as chain-of-thought prompting, self-consistency decoding, retrieval-based augmentation, and tool use have markedly improved performance on math benchmarks. Despite progress, the LLMs still struggle with reasoning faithfulness, generalization to new problems, and arithmetic accuracy. This paper identifies three key challenges and discuss corresponding future directions, including structured reasoning formats, automatic verification, and targeted training. Bridging the gap between current LLM performance and human-level mathematical proficiency will not only advance AI’s capabilities in math but also improve the transparency and reliability of reasoning in AI systems more broadly. Ultimately, strengthening LLMs in mathematics represents a step toward Artificial Intelligence (AI) systems that can provide trustworthy, interpretable, and high-value support in education, scientific research, and other domains requiring precise reasoning.

Downloads

Download data is not yet available.

References

[1] J. Wei, et al., Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

[2] T. Kojima, et al., Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 35, 22199–22213 (2022).

[3] X. Wang, et al., Self-consistency improves chain of thought reasoning in language models. Proc. Int. Conf. Learn. Represent. (ICLR 2023).

[4] Z. Wang, et al., RAT: Retrieval-augmented thoughts elicit context-aware reasoning in long-horizon generation. arXiv: 2403.05313 (2024).

[5] L. Gao, et al., PAL: Program-aided language models. Proc. 40th Int. Conf. Mach. Learn. (ICML 2023).

[6] A. Zhou, et al., Solving challenging math word problems using GPT-4 code interpreter with code-based self-verification. arXiv: 2308.07921 (2023).

[7] K. Cobbe, et al., Training verifiers to solve math word problems. arXiv: 2110.14168 (2021) (presented at ICLR 2022).

[8] D. Zhou, et al., Least-to-most prompting enables complex reasoning in large language models. arXiv: 2205.10625 (2022).

[9] P. Linardatos, V. Papastefanopoulos, S. Kotsiantis, Explainable AI: A review of machine learning interpretability methods. Entropy 23, 18 (2020).

[10] V. Vishwarupe, P.M. Joshi, N. Mathias, S. Maheshwari, S. Mhaisalkar, V. Pawar, Explainable AI and interpretable machine learning: A case study in perspective. Procedia Comput. Sci. 204, 869–876 (2022).

Downloads

Published

29-01-2026

Issue

Section

Articles

How to Cite

Wan, Y. (2026). Large Language Models for Mathematical Problem Solving: Applications, Challenges and Future Directions. Academic Journal of Science and Technology, 19(2), 424-431. https://doi.org/10.54097/0pbd1t53