Measuring the Groundedness of Legal Question-Answering Systems

Dietrich Trautmann, Natalia Ostapuk, Quentin Grail, Adrian Pol, Guglielmo Bonifazi, Shang Gao, Martin Gajek


Abstract
In high-stakes domains like legal question-answering, the accuracy and trustworthiness of generative AI systems are of paramount importance. This work presents a comprehensive benchmark of various methods to assess the groundedness of AI-generated responses, aiming to significantly enhance their reliability. Our experiments include similarity-based metrics and natural language inference models to evaluate whether responses are well-founded in the given contexts. We also explore different prompting strategies for large language models to improve the detection of ungrounded responses. We validated the effectiveness of these methods using a newly created grounding classification corpus, designed specifically for legal queries and corresponding responses from retrieval-augmented prompting, focusing on their alignment with source material. Our results indicate potential in groundedness classification of generated responses, with the best method achieving a macro-F1 score of 0.8. Additionally, we evaluated the methods in terms of their latency to determine their suitability for real-world applications, as this step typically follows the generation process. This capability is essential for processes that may trigger additional manual verification or automated response regeneration. In summary, this study demonstrates the potential of various detection methods to improve the trustworthiness of generative AI in legal settings.
Anthology ID:
2024.nllp-1.14
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
176–186
Language:
URL:
https://aclanthology.org/2024.nllp-1.14
DOI:
Bibkey:
Cite (ACL):
Dietrich Trautmann, Natalia Ostapuk, Quentin Grail, Adrian Pol, Guglielmo Bonifazi, Shang Gao, and Martin Gajek. 2024. Measuring the Groundedness of Legal Question-Answering Systems. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 176–186, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Measuring the Groundedness of Legal Question-Answering Systems (Trautmann et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.14.pdf