Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA

Kyubyung Chae; Jewon Yeom; Jeongjae Park; Seunghyun Bae; Ijun Jang; Hyunbin Jin; Jinkwan Jang; Taesup Kim

Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA

Kyubyung Chae, Jewon Yeom, Jeongjae Park, Seunghyun Bae, Ijun Jang, Hyunbin Jin, Jinkwan Jang, Taesup Kim

Abstract

Legal QA benchmarks have predominantly focused on case law, overlooking the unique challenges of statute-centric regulatory reasoning. In statutory domains, relevant evidence is distributed across hierarchically linked documents, creating a statutory retrieval gap where conventional retrievers fail and models often hallucinate under incomplete context. We introduce SearchFireSafety, a structure- and safety-aware benchmark for statute-centric legal QA. Instantiated on fire-safety regulations as a representative case, the benchmark evaluates whether models can retrieve hierarchically fragmented evidence and safely abstain when statutory context is insufficient. SearchFireSafety adopts a dual-track evaluation framework combining real-world questions that require citation-aware retrieval and synthetic partial-context scenarios that stress-test hallucination and refusal behavior. Experiments across multiple large language models show that graph-guided retrieval substantially improves performance, but also reveal a critical safety trade-off: domain-adapted models are more likely to hallucinate when key statutory evidence is missing. Our findings highlight the need for benchmarks that jointly evaluate hierarchical retrieval and model safety in statute-centric regulatory settings.

Anthology ID:: 2026.acl-long.2112
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45553–45573
Language:
URL:: https://aclanthology.org/2026.acl-long.2112/
DOI:
Bibkey:
Cite (ACL):: Kyubyung Chae, Jewon Yeom, Jeongjae Park, Seunghyun Bae, Ijun Jang, Hyunbin Jin, Jinkwan Jang, and Taesup Kim. 2026. Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45553–45573, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA (Chae et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2112.pdf
Checklist:: 2026.acl-long.2112.checklist.pdf

PDF Cite Search Checklist Fix data