A Scalable Framework for Legal Text Understanding in Regulatory and Financial Contexts.

Santiago Martínez, Juan Manuel Castañeda, Ruben Manrique


Abstract
This study presents a comprehensive approach to developing a domain-specific large language model (LLM) for regulatory and financial text interpretation. A specialized corpus was constructed through large-scale scraping of financial and regulatory documents across domains such as compliance, licensing, and financial reporting. The data was preprocessed using GPT-4o-mini with prompt engineering to retain critical information and remove noise. We further pre-trained a LLaMA-3.1-8B model on the curated corpus and fine-tuned it using an instruction dataset covering nine tasks from the Coling 2025 Regulations Challenge, including acronym expansion, regulatory question-answering, and XBRL-based financial analytics, employing QLoRA to reduce memory requirements. The model exhibits a slight improvement from baseline answering complex regulatory questions (detailed QA) and expanding acronyms. This study demonstrates the potential of domain-specific LLMs in regulatory text interpretation and lays the groundwork for future research in specialized NLP evaluation methodologies.
Anthology ID:
2025.finnlp-1.39
Volume:
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Chung-Chi Chen, Antonio Moreno-Sandoval, Jimin Huang, Qianqian Xie, Sophia Ananiadou, Hsin-Hsi Chen
Venues:
FinNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
326–334
Language:
URL:
https://aclanthology.org/2025.finnlp-1.39/
DOI:
Bibkey:
Cite (ACL):
Santiago Martínez, Juan Manuel Castañeda, and Ruben Manrique. 2025. A Scalable Framework for Legal Text Understanding in Regulatory and Financial Contexts.. In Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal), pages 326–334, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Scalable Framework for Legal Text Understanding in Regulatory and Financial Contexts. (Martínez et al., FinNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.finnlp-1.39.pdf