Alper Karamanlıoğlu
2025
Enhancing Regulatory Compliance Through Automated Retrieval, Reranking, and Answer Generation
Kübranur Umar
|
Hakan Doğan
|
Onur Özcan
|
İsmail Karakaya
|
Alper Karamanlıoğlu
|
Berkan Demirel
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
This paper explains a Retrieval-Augmented Generation (RAG) pipeline that optimizes reg- ularity compliance using a combination of em- bedding models (i.e. bge-m3, jina-embeddings- v3, e5-large-v2) with reranker (i.e. bge- reranker-v2-m3). To efficiently process long context passages, we introduce context aware chunking method. By using the RePASS met- ric, we ensure comprehensive coverage of obli- gations and minimizes contradictions, thereby setting a new benchmark for RAG-based regu- latory compliance systems. The experimen- tal results show that our best configuration achieves a score of 0.79 in Recall@10 and 0.66 in MAP@10 with LLaMA-3.1-8B model for answer generation.
A REGNLP Framework: Developing Retrieval-Augmented Generation for Regulatory Document Analysis
Ozan Bayer
|
Elif Nehir Ulu
|
Yasemin Sarkın
|
Ekrem Sütçü
|
Defne Buse Çelik
|
Alper Karamanlıoğlu
|
İsmail Karakaya
|
Berkan Demirel
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
This study presents the development of a Retrieval-Augmented Generation (RAG) framework tailored for analyzing regulatory documents from the Abu Dhabi Global Markets (ADGM). The methodology encompasses comprehensive data preprocessing, including extraction, cleaning, and compression of documents, as well as the organization of the ObliQA dataset. The embedding model is utilized for generating embeddings during the retrieval phase, facilitated by the txtai library for managing embeddings and streamlining testing. The training process incorporated innovative strategies such as duplicate recognition, dropout implementation, pooling adjustments, and label modifications to enhance retrieval performance. Hyperparameter tuning further refined the retrieval component, with improvements validated using the recall@10 metric, which measures the proportion of relevant passages among the top-10 results. The refined retrieval component effectively identifies pertinent passages within regulatory documents, expediting information access and supporting compliance efforts.
Search
Fix data
Co-authors
- Berkan Demirel 2
- İsmail Karakaya 2
- Ozan Bayer 1
- Hakan Doğan 1
- Yasemin Sarkın 1
- show all...