Ekrem Sütçü


2025

pdf bib
A REGNLP Framework: Developing Retrieval-Augmented Generation for Regulatory Document Analysis
Ozan Bayer | Elif Nehir Ulu | Yasemin Sarkın | Ekrem Sütçü | Defne Buse Çelik | Alper Karamanlıoğlu | İsmail Karakaya | Berkan Demirel
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)

This study presents the development of a Retrieval-Augmented Generation (RAG) framework tailored for analyzing regulatory documents from the Abu Dhabi Global Markets (ADGM). The methodology encompasses comprehensive data preprocessing, including extraction, cleaning, and compression of documents, as well as the organization of the ObliQA dataset. The embedding model is utilized for generating embeddings during the retrieval phase, facilitated by the txtai library for managing embeddings and streamlining testing. The training process incorporated innovative strategies such as duplicate recognition, dropout implementation, pooling adjustments, and label modifications to enhance retrieval performance. Hyperparameter tuning further refined the retrieval component, with improvements validated using the recall@10 metric, which measures the proportion of relevant passages among the top-10 results. The refined retrieval component effectively identifies pertinent passages within regulatory documents, expediting information access and supporting compliance efforts.