Research and Analysis on the Mechanism of Suppressing Large Model Hallucination Based on Modular RAG Architecture

Zihan Lin; Xinle Yang

doi:10.54097/bp841717

Authors

Zihan Lin
Xinle Yang

DOI:

https://doi.org/10.54097/bp841717

Keywords:

Large Language Models, RAG Architecture, retrieval, refinement, generation.

Abstract

Large Language Models (LLMs) perform remarkably well in knowledge-intensive tasks, yet they are difficult to deploy in high-stakes scenarios due to 'hallucinations'—outputs that contradict factual information. Although Retrieval-Augmented Generation (RAG) is widely regarded as a mainstream approach to mitigate hallucinations, most existing studies treat it as a black box and rarely analyze the heterogeneous functions of its internal modules. To address this, we propose a 'functionally decomposed' modular RAG taxonomy, dividing the entire process into three stages: retrieval, refinement, and generation, from which three technical pathways are derived: Direct Injection RAG (DI-RAG), Relevance-Focused RAG (RF-RAG), and Fact-Checking RAG (FC-RAG). Utilizing the large-scale real-world Q&A dataset MS MARCO, we constructed four benchmarks simulating high-risk scenarios such as information noise, knowledge conflicts, and outdated knowledge. Using Qwen1.5-7B-Chat as the generative backbone, we systematically evaluate the marginal benefits of the three architectures in suppressing hallucinations and quantify the contributions of components like re-rankers and multi-query verifiers in specific hallucination scenarios, providing actionable empirical guidance and optimization pathways for building high-fidelity RAG systems.

References

[1] Z. Ji, et al. Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. (2023).

[2] G. P. Reddy, Y. V. Pavan Kumar and K. P. Prakash, Hallucinations in Large Language Models (LLMs), 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, pp. 1 - 6, (2024). doi: 10.1109/eStream61684.2024.10542617.

[3] F. Cuconasu, G. Trappolini, F. Siciliano, et al. The Power of Noise: Redefining Retrieval for RAG Systems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24). Association for Computing Machinery, New York, NY, USA, 719 – 729. (2024). https://doi.org/10.1145/3626772.3657834.

[4] Y. Mao, X. Dong, W. Xu, Y. Gao, B. Wei, & Y. Zhang. FIT-RAG: Black-box RAG with factual information and token reduction. ACM Transactions on Information Systems, 43 (2), 1 - 27. (2025).

[5] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, ... & T. Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43 (2), 1 - 55. (2025).

[6] W. Su, C. Wang, Q. Ai, Y. Hu, Z. Wu, Y. Zhou, & Y. Liu. Unsupervised real-time hallucination detection based on the internal states of large language models. arXiv preprint arXiv: 2403.06448. (2024).

[7] P. Lewis, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (NeurIPS). (2020).

[8] M. G. Arivazhagan, L. Liu, P. Qi, X. Chen, W. Y. Wang, & Z. Huang. Hybrid hierarchical retrieval for open-domain question answering. In Findings of the Association for Computational Linguistics: ACL (pp. 10680 - 10689). (2023).

[9] J. Devlin, M. W. Chang, K. Lee, & K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171 - 4186). (2019).

[10] R. Nogueira, & J. Lin. Passage Re-ranking with BERT. arXiv preprint arXiv: 1901.04085. (2019).

[11] A. Asai, Z. Wu, Y. Wang, A. Sil, & H. Hajishirzi. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ArXiv, abs/2310.11511. (2023).