@inproceedings{figueiredo-etal-2026-grounded,
title = "Grounded in Law: A Multi-Stage Anti-Hallucination Pipeline for Legal {RAG} Systems in {B}razilian {P}ortuguese",
author = "Figueiredo, Arla and
Lucas, Jo{\~a}o and
Ribeiro, Tatiana and
Nery, Caio and
Rios, Alan and
Hebert, Caio and
Florentino, Luiza and
Silva, Arthur and
Feyerabend, {\'I}caro and
Vidal, Pedro and
Cabral, Bruno",
editor = "Souza, Marlo and
de-Dios-Flores, Iria and
Santos, Diana and
Freitas, Larissa and
Souza, Jackson Wilke da Cruz and
Ribeiro, Eug{\'e}nio",
booktitle = "Proceedings of the 17th International Conference on Computational Processing of {P}ortuguese ({PROPOR} 2026) - Vol. 2",
month = apr,
year = "2026",
address = "Salvador, Brazil",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.propor-2.9/",
pages = "30--34",
ISBN = "979-8-89176-387-6",
abstract = "Large Language Models (LLMs) are effective text generators but create legal citations at non-trivial rates, a failure mode with serious consequences in legal practice. In Brazilian Portuguese the risk is amplified by citation variability (juridiqu{\^e}s), fragment-level references (article {\textrightarrow} paragraph {\textrightarrow} item), and the need to distinguish jurisdictions and court instances.We describe a production Retrieval-Augmented Generation (RAG) system deployed at a Brazilian legal-technology platform. The system combines (1) domain-tuned hybrid retrieval (lexical, dense, and cross-encoder reranking) over a large-scale legal corpus; (2) grounded generation with explicit citation constraints; and (3) a post-generation Reference Audit layer that extracts legislation and jurisprudence mentions via specialized taggers, normalizes them to a canonical schema, checks existence against authoritative databases at fragment granularity, verifies fidelity against official texts, and triggers targeted rewrites when inconsistencies are detected.We report production telemetry from 184,895 audited answers containing 43,175 extracted legal references. Legislation references resolve at 81.7{\%}, while jurisprudence references resolve at only 47.1{\%}, identifying case-law normalization as the primary bottleneck for practitioners. Fidelity verification corrected 6.5{\%} of checked answers before delivery, preventing misrepresented legal claims from reaching end users. By converting silent hallucinations into explicit warnings with per-reference status, the system enables legal professionals to trust verified citations and efficiently review flagged ones, rather than manually checking every authority."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="figueiredo-etal-2026-grounded">
<titleInfo>
<title>Grounded in Law: A Multi-Stage Anti-Hallucination Pipeline for Legal RAG Systems in Brazilian Portuguese</title>
</titleInfo>
<name type="personal">
<namePart type="given">Arla</namePart>
<namePart type="family">Figueiredo</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">João</namePart>
<namePart type="family">Lucas</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tatiana</namePart>
<namePart type="family">Ribeiro</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Caio</namePart>
<namePart type="family">Nery</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Alan</namePart>
<namePart type="family">Rios</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Caio</namePart>
<namePart type="family">Hebert</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Luiza</namePart>
<namePart type="family">Florentino</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Arthur</namePart>
<namePart type="family">Silva</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ícaro</namePart>
<namePart type="family">Feyerabend</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Pedro</namePart>
<namePart type="family">Vidal</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bruno</namePart>
<namePart type="family">Cabral</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2026-04</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2</title>
</titleInfo>
<name type="personal">
<namePart type="given">Marlo</namePart>
<namePart type="family">Souza</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Iria</namePart>
<namePart type="family">de-Dios-Flores</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Diana</namePart>
<namePart type="family">Santos</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Larissa</namePart>
<namePart type="family">Freitas</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jackson</namePart>
<namePart type="given">Wilke</namePart>
<namePart type="given">da</namePart>
<namePart type="given">Cruz</namePart>
<namePart type="family">Souza</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Eugénio</namePart>
<namePart type="family">Ribeiro</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Salvador, Brazil</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-387-6</identifier>
</relatedItem>
<abstract>Large Language Models (LLMs) are effective text generators but create legal citations at non-trivial rates, a failure mode with serious consequences in legal practice. In Brazilian Portuguese the risk is amplified by citation variability (juridiquês), fragment-level references (article → paragraph → item), and the need to distinguish jurisdictions and court instances.We describe a production Retrieval-Augmented Generation (RAG) system deployed at a Brazilian legal-technology platform. The system combines (1) domain-tuned hybrid retrieval (lexical, dense, and cross-encoder reranking) over a large-scale legal corpus; (2) grounded generation with explicit citation constraints; and (3) a post-generation Reference Audit layer that extracts legislation and jurisprudence mentions via specialized taggers, normalizes them to a canonical schema, checks existence against authoritative databases at fragment granularity, verifies fidelity against official texts, and triggers targeted rewrites when inconsistencies are detected.We report production telemetry from 184,895 audited answers containing 43,175 extracted legal references. Legislation references resolve at 81.7%, while jurisprudence references resolve at only 47.1%, identifying case-law normalization as the primary bottleneck for practitioners. Fidelity verification corrected 6.5% of checked answers before delivery, preventing misrepresented legal claims from reaching end users. By converting silent hallucinations into explicit warnings with per-reference status, the system enables legal professionals to trust verified citations and efficiently review flagged ones, rather than manually checking every authority.</abstract>
<identifier type="citekey">figueiredo-etal-2026-grounded</identifier>
<location>
<url>https://aclanthology.org/2026.propor-2.9/</url>
</location>
<part>
<date>2026-04</date>
<extent unit="page">
<start>30</start>
<end>34</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Grounded in Law: A Multi-Stage Anti-Hallucination Pipeline for Legal RAG Systems in Brazilian Portuguese
%A Figueiredo, Arla
%A Lucas, João
%A Ribeiro, Tatiana
%A Nery, Caio
%A Rios, Alan
%A Hebert, Caio
%A Florentino, Luiza
%A Silva, Arthur
%A Feyerabend, Ícaro
%A Vidal, Pedro
%A Cabral, Bruno
%Y Souza, Marlo
%Y de-Dios-Flores, Iria
%Y Santos, Diana
%Y Freitas, Larissa
%Y Souza, Jackson Wilke da Cruz
%Y Ribeiro, Eugénio
%S Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
%D 2026
%8 April
%I Association for Computational Linguistics
%C Salvador, Brazil
%@ 979-8-89176-387-6
%F figueiredo-etal-2026-grounded
%X Large Language Models (LLMs) are effective text generators but create legal citations at non-trivial rates, a failure mode with serious consequences in legal practice. In Brazilian Portuguese the risk is amplified by citation variability (juridiquês), fragment-level references (article → paragraph → item), and the need to distinguish jurisdictions and court instances.We describe a production Retrieval-Augmented Generation (RAG) system deployed at a Brazilian legal-technology platform. The system combines (1) domain-tuned hybrid retrieval (lexical, dense, and cross-encoder reranking) over a large-scale legal corpus; (2) grounded generation with explicit citation constraints; and (3) a post-generation Reference Audit layer that extracts legislation and jurisprudence mentions via specialized taggers, normalizes them to a canonical schema, checks existence against authoritative databases at fragment granularity, verifies fidelity against official texts, and triggers targeted rewrites when inconsistencies are detected.We report production telemetry from 184,895 audited answers containing 43,175 extracted legal references. Legislation references resolve at 81.7%, while jurisprudence references resolve at only 47.1%, identifying case-law normalization as the primary bottleneck for practitioners. Fidelity verification corrected 6.5% of checked answers before delivery, preventing misrepresented legal claims from reaching end users. By converting silent hallucinations into explicit warnings with per-reference status, the system enables legal professionals to trust verified citations and efficiently review flagged ones, rather than manually checking every authority.
%U https://aclanthology.org/2026.propor-2.9/
%P 30-34
Markdown (Informal)
[Grounded in Law: A Multi-Stage Anti-Hallucination Pipeline for Legal RAG Systems in Brazilian Portuguese](https://aclanthology.org/2026.propor-2.9/) (Figueiredo et al., PROPOR 2026)
ACL
- Arla Figueiredo, João Lucas, Tatiana Ribeiro, Caio Nery, Alan Rios, Caio Hebert, Luiza Florentino, Arthur Silva, Ícaro Feyerabend, Pedro Vidal, and Bruno Cabral. 2026. Grounded in Law: A Multi-Stage Anti-Hallucination Pipeline for Legal RAG Systems in Brazilian Portuguese. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2, pages 30–34, Salvador, Brazil. Association for Computational Linguistics.