Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)

Tuba Gokhan, Kexin Wang, Iryna Gurevych, Ted Briscoe (Editors)

Anthology ID:: 2025.regnlp-1
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Venues:: RegNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://aclanthology.org/2025.regnlp-1/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2025.regnlp-1.pdf

pdf bib
Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
Tuba Gokhan | Kexin Wang | Iryna Gurevych | Ted Briscoe

pdf bib abs
Shared Task RIRAG-2025: Regulatory Information Retrieval and Answer Generation
Tuba Gokhan | Kexin Wang | Iryna Gurevych | Ted Briscoe

This paper provides an overview of the Shared Task RIRAG-2025, which focused on advancing the field of Regulatory Information Retrieval and Answer Generation (RIRAG). The task was designed to evaluate methods for answering regulatory questions using the ObliQA dataset. This paper summarizes the shared task, participants’ methods, and the results achieved by various teams.

pdf bib abs
Challenges in Technical Regulatory Text Variation Detection
Shriya Vaagdevi Chikati | Samuel Larkin | David Minicola | Chi-kiu Lo

We present a preliminary study on the feasibility of using current natural language processing techniques to detect variations between the construction codes of different jurisdictions. We formulate the task as a sentence alignment problem and evaluate various sentence representation models for their performance in this task. Our results show that task-specific trained embeddings perform marginally better than other models, but the overall accuracy remains a challenge. We also show that domain-specific fine-tuning hurts the task performance. The results highlight the challenges of developing NLP applications for technical regulatory texts.

pdf bib abs
Bilingual BSARD: Extending Statutory Article Retrieval to Dutch
Ehsan Lotfi | Nikolay Banar | Nerses Yuzbashyan | Walter Daelemans

Statutory article retrieval plays a crucial role in making legal information more accessible to both laypeople and legal professionals. Multilingual countries like Belgium present unique challenges for retrieval models due to the need for handling legal issues in multiple languages. Building on the Belgian Statutory Article Retrieval Dataset (BSARD) in French, we introduce the bilingual version of this dataset, bBSARD. The dataset contains parallel Belgian statutory articles in both French and Dutch, along with legal questions from BSARD and their Dutch translation. Using bBSARD, we conduct extensive benchmarking of retrieval models available for Dutch and French. Our benchmarking setup includes lexical models, zero-shot dense models, and fine-tuned small foundation models. Our experiments show that BM25 remains a competitive baseline compared to many zero-shot dense models in both languages. We also observe that while proprietary models outperform open alternatives in the zero-shot setting, they can be matched or surpassed by fine-tuning small language-specific models. Our dataset and evaluation code are publicly available.

pdf bib abs
Unifying Large Language Models and Knowledge Graphs for efficient Regulatory Information Retrieval and Answer Generation
Kishore Vanapalli | Aravind Kilaru | Omair Shafiq | Shahzad Khan

In a rapidly changing socio-economic land-scape, regulatory documents play a pivotal role in shaping responses to emerging challenges. An efficient regulatory document monitoring system is crucial for addressing the complexi ties of a dynamically evolving world, enabling prompt crisis response, simplifying compliance, and empowering data-driven decision-making. In this work, we present a novel comprehensive analytical framework, PolicyInsight, which is based on a specialized regulatory data model and state-of-the-art NLP techniques of Large Language Models (LLMs) and Knowledge Graphs to derive timely insights, facilitating data-driven decision-making and fostering a more transparent and informed governance ecosystem for regulators, businesses, and citizens.

pdf bib
A Hybrid Approach to Information Retrieval and Answer Generation for Regulatory Texts
Jhon Stewar Rayo Mosquera | Carlos Raul De La Rosa Peredo | Mario Garrido Cordoba

This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and reranking for retrieving relevant documents in top ranks. We utilized a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights the transformative potential of natural language processing techniques in regulatory applications, offering insights into their capabilities for implementing a retrieval augmented generation system while identifying areas for future improvement in robustness and domain adaptation.

pdf bib abs
MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation
Yash Malviya | Karan Dhingra | Maneesh Singh

Regulatory documents are rich in nuanced terminology and specialized semantics. FRAG systems: Frozen retrieval-augmented generators utilizing pre-trained (or, frozen) components face consequent challenges with both retriever and answering performance. We present a system that adapts the retriever performance to the target domain using a multi-stage tuning (MST) strategy. Our retrieval approach, called MST-R (a) first fine-tunes encoders used in vector stores using hard negative mining, (b) then uses a hybrid retriever, combining sparse and dense retrievers using reciprocal rank fusion, and then (c) adapts the cross-attention encoder by fine-tuning only the top-k retrieved results. We benchmark the system performance on the dataset released for the RIRAG challenge (as part of the RegNLP workshop at COLING 2025). We achieve significant performance gains obtaining a top rank on the RegNLP challenge leaderboard. We also show that a trivial answering approach *games* the RePASs metric outscoring all baselines and a pre-trained Llama model. Analyzing this anomaly, we present important takeaways for future research. We also release our [code base](https://github.com/Indic-aiDias/MST-R)

pdf bib abs
AUEB-Archimedes at RIRAG-2025: Is Obligation concatenation really all you need?
Ioannis Chasandras | Odysseas S. Chlapanis | Ion Androutsopoulos

This paper presents the systems we developed for RIRAG-2025, a shared task that requires answering regulatory questions by retrieving relevant passages. The generated answers are evaluated using RePASs, a reference-free and model-based metric. Our systems use a combination of three retrieval models and a reranker. We show that by exploiting a neural component of RePASs that extracts important sentences (‘obligations’) from the retrieved passages, we achieve a dubiously high score (0.947), even though the answers are directly extracted from the retrieved passages and are not actually generated answers. We then show that by selecting the answer with the best RePASs among a few generated alternatives and then iteratively refining this answer by reducing contradictions and covering more obligations, we can generate readable, coherent answers that achieve a more plausible and relatively high score (0.639).

pdf bib abs
Structured Tender Entities Extraction from Complex Tables with Few-short Learning
Asim Abbas | Mark Lee | Niloofer Shanavas | Venelin Kovatchev | Mubashir Ali

Extracting structured text from complex tables in PDF tender documents remains a challenging task due to the loss of structural and positional information during the extraction process. AI-based models often require extensive training data, making development from scratch both tedious and time-consuming. Our research focuses on identifying tender entities in complex table formats within PDF documents. To address this, we propose a novel approach utilizing few-shot learning with large language models (LLMs) to restore the structure of extracted text. Additionally, handcrafted rules and regular expressions are employed for precise entity classification. To evaluate the robustness of LLMs with few-shot learning, we employ data-shuffling techniques. Our experiments show that current text extraction tools fail to deliver satisfactory results for complex table structures. However, the few-shot learning approach significantly enhances the structural integrity of extracted data and improves the accuracy of tender entity identification.

pdf bib abs
A Two-Stage LLM System for Enhanced Regulatory Information Retrieval and Answer Generation
Fengzhao Sun | Jun Yu | Jiaming Hou | Yutong Lin | Tianyu Liu

This technical report describes our methodology for the Regulatory Information Retrieval and Answer Generation (RIRAG) Shared Task, a component of the RegNLP workshop at COLING 2025. The challenge aims to effectively navigate and extract relevant information from regulatory texts to generate precise, coherent answers for compliance and obligation-related queries. To tackle subtask1, we introduce a two-stage approach comprising an initial output stage and a subsequent refinement stage. Initially, we fine-tune the LLaMa-2-7B model using LoRA to produce a preliminary output. This is followed by the application of an expert mechanism to enhance the results. For subtask2, we design specific prompt to facilitate the generation of high-quality answers. Consequently, our approach has achieved state-of-the-art performance on the leaderboard, which serves as a testament to the effectiveness and competitiveness of our proposed methodology.

pdf bib abs
NUST Nova at RIRAG 2025: A Hybrid Framework for Regulatory Information Retrieval and Question Answering
Mariam Babar Khan | Huma Ameer | Seemab Latif | Mehwish Fatima

NUST Nova participates in RIRAG Shared Task, addressing two critical challenges: Task 1 involves retrieving relevant subsections from regulatory documents based on user queries, while Task 2 focuses on generating concise, contextually accurate answers using the retrieved information. We propose a Hybrid Retrieval Framework that combines graph-based retrieval, vector-based methods, and keyword matching BM25 to enhance relevance and precision in regulatory QA. Using score-based fusion and iterative refinement, the framework retrieves the top 10 relevant passages, which are then used by an LLM to generate accurate, context-aware answers. After empirical evaluation, we also conduct an error analysis to identify our framework’s limitations.

NUST Alpha participates in the Regulatory Information Retrieval and Answer Generation (RIRAG) shared task. We propose FusionRAG that combines OpenAI embeddings, BM25, FAISS, and Rank-Fusion to improve information retrieval and answer generation. We also explores multiple variants of our model to assess the impact of each component in overall performance. FusionRAG strength comes from our rank fusion and filter strategy. Rank fusion integrates semantic and lexical relevance scores to optimize retrieval accuracy and result diversity, and Filter mechanism remove irrelevant passages before answer generation. Our experiments demonstrate that FusionRAG offers a robust and scalable solution for automating the analysis of regulatory documents, improving compliance efficiency, and mitigating associated risks. We further conduct an error analysis to explore the limitations of our model’s performance.

pdf bib abs
NUST Omega at RIRAG 2025: Investigating Context-aware Retrieval and Answer Generations-Lessons and Challenges
Huma Ameer | Muhammad Hannan Akram | Seemab Latif | Mehwish Fatima

NUST Omega participates in Regulatory Information Retrieval and Answer Generation (RIRAG) Shared Task. Regulatory documents poses unique challenges in retrieving and generating precise and relevant answers due to their inherent complexities. We explore the task by proposing a progressive retrieval pipeline and investigate its performance with multiple variants. Some variants include different embeddings to explore their effects on the retrieval score. Some variants examine the inclusion of keyword-driven query matching technique. After exploring such variations, we include topic modeling in our pipeline to investigate its impact on the performance. We also study the performance of various prompt techniques with our proposed pipeline. With empirical experiments, we find some strengths and limitations in the proposed pipeline. These findings will help the research community by offering valuable insights to make advancements in tackling this complex task.

This paper explains a Retrieval-Augmented Generation (RAG) pipeline that optimizes reg- ularity compliance using a combination of em- bedding models (i.e. bge-m3, jina-embeddings- v3, e5-large-v2) with reranker (i.e. bge- reranker-v2-m3). To efficiently process long context passages, we introduce context aware chunking method. By using the RePASS met- ric, we ensure comprehensive coverage of obli- gations and minimizes contradictions, thereby setting a new benchmark for RAG-based regu- latory compliance systems. The experimen- tal results show that our best configuration achieves a score of 0.79 in Recall@10 and 0.66 in MAP@10 with LLaMA-3.1-8B model for answer generation.

This study presents the development of a Retrieval-Augmented Generation (RAG) framework tailored for analyzing regulatory documents from the Abu Dhabi Global Markets (ADGM). The methodology encompasses comprehensive data preprocessing, including extraction, cleaning, and compression of documents, as well as the organization of the ObliQA dataset. The embedding model is utilized for generating embeddings during the retrieval phase, facilitated by the txtai library for managing embeddings and streamlining testing. The training process incorporated innovative strategies such as duplicate recognition, dropout implementation, pooling adjustments, and label modifications to enhance retrieval performance. Hyperparameter tuning further refined the retrieval component, with improvements validated using the recall@10 metric, which measures the proportion of relevant passages among the top-10 results. The refined retrieval component effectively identifies pertinent passages within regulatory documents, expediting information access and supporting compliance efforts.

pdf bib abs
Regulatory Question-Answering using Generative AI
Devin Quinn | Sumit P. Pai | Iman Yousfi | Nirmala Pudota | Sanmitra Bhattacharya

Although retrieval augmented generation (RAG) has proven to be an effective approach for creating question-answering systems on a corpus of documents, there is a need to improve the performance of these systems, especially in the regulatory domain where clear and accurate answers are required. This paper outlines the methodology used in our submission to the Regulatory Information Retrieval and Answer Generation (RIRAG) shared task at the Regulatory Natural Language Processing Workshop (RegNLP 2025). The goal is to improve document retrieval (Shared Task 1) and answer generation (Shared Task 2). Our pipeline is constructed as a two-step process for Shared Task 1. In the first step, we utilize a text-embedding-ada-002-based retriever, followed by a RankGPT-based re-ranker. The ranked results of Task 1 are then used to generate responses to user queries in Shared Task 2 through a prompt-based approach using GPT-4o. For Shared Task 1, we achieved a recall rate of 75%, and with the prompts we developed, we were able to generate coherent answers for Shared Task 2.

pdf bib abs
RIRAG: A Bi-Directional Retrieval-Enhanced Framework for Financial Legal QA in ObliQA Shared Task
Xinyan Zhang | Xiaobing Feng | Xiujuan Xu | Zhiliang Zheng | Kai Wu

In professional financial-legal consulting services, accurately and efficiently retrieving and answering legal questions is crucial. Although some breakthroughs have been made in information retrieval and answer generation, few frameworks have successfully integrated these tasks. Therefore, we propose RIRAG (Retrieval-In-the-loop Response and Answer Generation), a bi-directional retrieval-enhanced framework for financial-legal question answering in ObliQA Shared Task. The system introduces BDD-FinLegal, which means Bi-Directional Dynamic finance-legal, a novel retrieval mechanism specifically designed for financial-legal documents, combining traditional retrieval algorithms with modern neural network methods. Legal answer generation is implemented through large language models retrained on expert-annotated datasets. Our method significantly improves the professionalism and interpretability of the answers while maintaining high retrieval accuracy. Experiments on the ADGM dataset show that the system achieved a significant improvement in the Recall@10 evaluation metric and was recognized by financial legal experts for the accuracy and professionalism of the answer generation. This study provides new ideas for building efficient and reliable question-answering systems in the financial-legal domain.

Regulatory Natural Language Processing (RegNLP) is a multidisciplinary domain focused on facilitating access to and comprehension of regulatory regulations and requirements. This paper outlines our strategy for creating a system to address the Regulatory Information Retrieval and Answer Generation (RIRAG) challenge, which was conducted during the RegNLP 2025 Workshop. The objective of this competition is to design a system capable of efficiently extracting pertinent passages from regulatory texts (ObliQA) and subsequently generating accurate, cohesive responses to inquiries related to compliance and obligations. Our proposed method employs a lightweight BM25 pre-filtering in retrieving relevant passages. This technique efficiently shortlisting candidates for subsequent processing with Transformer-based embeddings, thereby optimizing the use of resources.