A REGNLP Framework: Developing Retrieval-Augmented Generation for Regulatory Document Analysis

Ozan Bayer, Elif Nehir Ulu, Yasemin Sarkın, Ekrem Sütçü, Defne Buse Çelik, Alper Karamanlıoğlu, İsmail Karakaya, Berkan Demirel


Abstract
This study presents the development of a Retrieval-Augmented Generation (RAG) framework tailored for analyzing regulatory documents from the Abu Dhabi Global Markets (ADGM). The methodology encompasses comprehensive data preprocessing, including extraction, cleaning, and compression of documents, as well as the organization of the ObliQA dataset. The embedding model is utilized for generating embeddings during the retrieval phase, facilitated by the txtai library for managing embeddings and streamlining testing. The training process incorporated innovative strategies such as duplicate recognition, dropout implementation, pooling adjustments, and label modifications to enhance retrieval performance. Hyperparameter tuning further refined the retrieval component, with improvements validated using the recall@10 metric, which measures the proportion of relevant passages among the top-10 results. The refined retrieval component effectively identifies pertinent passages within regulatory documents, expediting information access and supporting compliance efforts.
Anthology ID:
2025.regnlp-1.15
Volume:
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Tuba Gokhan, Kexin Wang, Iryna Gurevych, Ted Briscoe
Venues:
RegNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–101
Language:
URL:
https://aclanthology.org/2025.regnlp-1.15/
DOI:
Bibkey:
Cite (ACL):
Ozan Bayer, Elif Nehir Ulu, Yasemin Sarkın, Ekrem Sütçü, Defne Buse Çelik, Alper Karamanlıoğlu, İsmail Karakaya, and Berkan Demirel. 2025. A REGNLP Framework: Developing Retrieval-Augmented Generation for Regulatory Document Analysis. In Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025), pages 97–101, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A REGNLP Framework: Developing Retrieval-Augmented Generation for Regulatory Document Analysis (Bayer et al., RegNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.regnlp-1.15.pdf