Democratizing Legal Analytics: Resource-Efficient Information Extraction for Brazilian Case Law

Rodrigo Filippi Dornelles


Abstract
Legal systems produce large volumes of high-stakes decisions in unstructured natural language, making large-scale empirical analysis costly, difficult to reproduce, and unevenly accessible. This bottleneck is especially acute for legal analytics and policy evaluation in low-resource languages such as Portuguese. To address it, we present a resource-efficient pipeline for information extraction from Brazilian criminal case law that reuses a legacy dataset to fine-tune open-weight LLMs with Q-LoRA. Operating in a small-data setting and using schema-constrained JSON generation, the pipeline extracts 47 legal variables spanning charges, evidence, and sentencing outcome. In held-out evaluation, a fine-tuned Phi-4 (14B) model achieves 92.8% accuracy and 0.826 macro-F1, approaching proprietary baselines while retaining the cost and privacy benefits of local deployment. We then use the extracted data in a case study of the short-term effects of a recent Brazilian Supreme Court ruling on drug decriminalization, finding no statistically significant change in trafficking-conviction rates (p≥0.05), a pattern consistent with short-run institutional inertia. More broadly, the paper contributes a reproducible framework for legal NLP and shows how legacy empirical datasets can support scalable legal analytics under severe resource constraints.
Anthology ID:
2026.propor-1.103
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1011–1020
Language:
URL:
https://aclanthology.org/2026.propor-1.103/
DOI:
Bibkey:
Cite (ACL):
Rodrigo Filippi Dornelles. 2026. Democratizing Legal Analytics: Resource-Efficient Information Extraction for Brazilian Case Law. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 1011–1020, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Democratizing Legal Analytics: Resource-Efficient Information Extraction for Brazilian Case Law (Dornelles, PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-1.103.pdf