(1)
Yang, S. A Diagnosing Untruthfulness: A G-Eval and Bootstrap Analysis of LLM Failure Modes on TruthfulQA. MMAA 2026, 9 (1), 652-657. https://doi.org/10.54097/vtwqfr43.