Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering

Yinghao Hu, Leilei Gan, Wenyi Xiao, Kun Kuang, Fei Wu


Abstract
Hallucination, or the generation of incorrect or fabricated information, remains a critical challenge in large language models (LLMs), particularly in high-stake domains such as legal question answering (QA). In order to mitigate the hallucination rate in legal QA, we first introduce a benchmark called LegalHalBench and three automatic metrics to evaluate the common hallucinations when LLMs answer legal questions. We then propose a hallucination mitigation method that integrates behavior cloning and a novel Hard Sample-aware Iterative Direct Preference Optimization (HIPO). We conduct extensive real-data experiments to validate the effectiveness of our approach. Our results demonstrate remarkable improvements in various metrics, including the newly proposed Non-Hallucinated Statute Rate, Statute Relevance Rate, Legal Claim Truthfulness, as well as traditional metrics such as METEOR, BERTScore, ROUGE-L, and win rates.
Anthology ID:
2025.coling-main.298
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4410–4427
Language:
URL:
https://aclanthology.org/2025.coling-main.298/
DOI:
Bibkey:
Cite (ACL):
Yinghao Hu, Leilei Gan, Wenyi Xiao, Kun Kuang, and Fei Wu. 2025. Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4410–4427, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering (Hu et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.298.pdf