Bridging Rationales and Relations: The Graph-Rationale-Guided Retrieval-Augmented Generation in Medical QA

Zhenyu Zhu

doi:10.54097/vee3xx26

Authors

Zhenyu Zhu

DOI:

https://doi.org/10.54097/vee3xx26

Keywords:

Graph-Retrieval-Augmented Generation, Knowledge Graph, Medical Question Answering, Hallucination Detection, Quantized Low-Rank Adaptation.

Abstract

Large language models (LLMs) face challenges of hallucination and knowledge obsolescence in medical question answering. Existing retrieval-augmented generation (RAG) frameworks can improve retrieval reliability through rational guiding; however, their neglect of structured knowledge leads to insufficient relational reasoning. This paper proposes the Graph-Rationale-Guided Retrieval-Augmented Generation (GRAG) framework, which introduces a knowledge graph layer based on Rationality-Guided Retrieval-Augmented Generation (RAG²) to support dynamic graph query expansion, evidence fusion, hallucination detection, and quantized low-rank adaptation (QLoRA). GRAG's core mechanisms include rational generation, entity extraction, graph construction based on Unified Medical Language System (UMLS)-Neo4j, and similarity-driven multi-source evidence fusion. Experiments on medical question answering dataset (MedQA), medical muti-choice question answering (MedMCQA), and a self-constructed RareDisease-MedQuAD subset show that GRAG outperforms baseline models by approximately 10-12% in accuracy, reduces hallucination rate by around 20%, and achieves graph fidelity exceeding 80%. Ablation experiments further confirm that the knowledge graph (KG) and QLoRA modules each contribute approximately 5-8% to performance improvements. Overall, GRAG bridges the gap between rational guidance and structured retrieval, providing a more interpretable and reliable solution for MedQA systems.

Downloads

Download data is not yet available.

References

[1] D. Jin, E. Pan, N. Oufattole, What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl. Sci. 11, 6421 (2021).

[2] J. Maynez, S. Narayan, B. Bohnet, On faithfulness and factuality in abstractive summarization, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 5–10 (2020).

[3] J. Jang, S. Ye, S. Yang, Towards continual knowledge learning of language models, in Proceedings of the International Conference on Learning Representations, Virtual, April 25–29 (2022).

[4] P. Lewis, E. Perez, A. Piktus, Retrieval-augmented generation for knowledge-intensive NLP tasks, in Proceedings of the Advances in Neural Information Processing Systems, Virtual, December 6–14 (2020).

[5] J. Sohn, Y. Zhang, K. Huang, Rationale-guided retrieval augmented generation for medical question answering, in Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Mexico City, Mexico, June 16–21 (2025).

[6] H. Yu, S. Zhang, X. Zhang, Evaluation of retrieval-augmented generation: A survey, in Proceedings of Big Data Analytics and Knowledge Discovery, Springer, Cham, Switzerland (2025).

[7] J. Wu, J. Zhu, Y. Qi, Medical graph RAG: Evidence-based medical large language model via graph retrieval-augmented generation, in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, August 11–16 (2025).

[8] Y. Shi, J. Gao, H. Chen, MKRAG: Medical knowledge retrieval augmented generation for medical question answering, in Proceedings of the AMIA Annual Symposium, Washington, DC, USA, November 16–20 (2024).

[9] E. J. Hu, Y. Shen, P. Wallis, LoRA: Low-rank adaptation of large language models, in Proceedings of the International Conference on Learning Representations, Virtual, April 25–29 (2022).

[10] OpenAI, GPT-4 technical report, https://arxiv.org/abs/2303.08774 (2023).

[11] AI@Meta, The Llama 3 herd of models, https://arxiv.org/abs/2407.21783 (2024).

[12] A. Pal, Med-HALT: Medical domain hallucination test for large language models, in Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), Singapore, December 6–11 (2023).

[13] L. Huang, W. Yu, W. Ma, A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions, https://arxiv.org/abs/2311.05232 (2023).

[14] Q. Jin, B. Dhingra, Z. Liu, PubMedQA: A dataset for biomedical research question answering, in Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, November 3–7 (2019).

[15] S. Gao, H. Fang, X. Kang, Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

[16] G. Xiong, Q. Jin, X. Wang, Improving retrieval-augmented generation in medicine with iterative follow-up questions, in Proceedings of the AMIA Annual Symposium, Washington, DC, USA, November 16–20 (2024).

[17] S. Das, Two-layer retrieval-augmented generation framework for low-resource medical question answering using Reddit data: Proof-of-concept study. JMIR Form. Res. 9, e66220 (2025).

[18] N. T. Ngo, C. V. Nguyen, Comprehensive and practical evaluation of retrieval-augmented generation systems for medical question answering, in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), Bangkok, Thailand, August 11–16 (2024).

[19] Y. Ji, H. Zhang, Y. Wang, Bias evaluation and mitigation in retrieval-augmented medical question-answering systems, in Proceedings of the AMIA Annual Symposium, Washington, DC, USA, November 16–20 (2025).

[20] K. Soman, R. Banerjee, P. Ranade, Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 40, btae560 (2024).

[21] Z. Wang, H. Wang, B. Danek, A perspective for adapting generalist AI to specialized medical AI applications and their challenges. npj Digit. Med. 8, 429 (2025).

[22] A. J. Thirunavukarasu, D. Ting, K. Elangovan, Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

[23] Y. Kim, H. Jeong, S. Chen, Medical hallucination in foundation models and their impact on healthcare, https://doi.org/10.1101/2025.02.28.25323115 (2025).

[24] Hugging Face, Transformers library v4.55.4, https://github.com/huggingface/transformers/releases/tag/v4.55.4.

[25] Hugging Face, Accelerate library v0.34.2, https://github.com/huggingface/accelerate/releases/tag/v0.34.2

[26] A. Névéol, Y. Li, J. Kim, SciSpacy: Fast and robust models for biomedical natural language processing, in Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy, August 1 (2019).

[27] MedQuAD Team, MedQuAD: Medical question answering dataset, https://github.com/abachaa/MedQuAD.

[28] Neo4j Team, Neo4j graph database v5.21.0, https://neo4j.com/docs/.

[29] Monarch Initiative, Rare Disease Ontology (RDO), https://www.rarediseontology.org/.

[30] National Library of Medicine, Unified Medical Language System (UMLS) REST API, https://documentation.uts.nlm.nih.gov/rest/home.html.

[31] N. Reimers, I. Gurevych, Sentence-transformers: Sentence embeddings using Siamese BERT-networks, in Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, November 3–7 (2019).

[32] P. Jaccard, The distribution of the flora in the alpine zone. New Phytol. 11, 37–50 (1912).