Raheel Qader - ACL Anthology

Raheel Qader

2025

DOLFIN - Document-Level Financial Test-Set for Machine Translation
Mariam Nakhle | Marco Dinarelli | Raheel Qader | Emmanuelle Esperança-Rodier | Hervé Blanchon
Findings of the Association for Computational Linguistics: NAACL 2025

Despite the strong research interest in document-level Machine Translation (MT), the test-sets dedicated to this task are still scarce. The existing test-sets mainly cover topics from the general domain and fall short on specialised domains, such as legal and financial. Also, despite their document-level aspect, they still follow a sentence-level logic that doesn’t allow for including certain linguistic phenomena such as information reorganisation. In this work, we aim to fill this gap by proposing a novel test-set : DOLFIN. The dataset is built from specialised financial documents and it makes a step towards true document-level MT by abandoning the paradigm of perfectly aligned sentences, presenting data in units of sections rather than sentences. The test-set consists of an average of 1950 aligned sections for five language pairs. We present the detailed data collection pipeline that can serve as inspiration for aligning new document-level datasets. We demonstrate the usefulness and the quality of this test-set with the evaluation of a series of models. Our results show that the test-set is able to discriminate between context-sensitive and context-agnostic models and shows the weaknesses when models fail to accurately translate financial texts. The test-set will be made public for the community.

GraphRAG: Leveraging Graph-Based Efficiency to Minimize Hallucinations in LLM-Driven RAG for Finance Data
Mariam Barry | Gaetan Caillaut | Pierre Halftermeyer | Raheel Qader | Mehdi Mouayad | Fabrice Le Deit | Dimitri Cariolaro | Joseph Gesnouin
Proceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK)

This study explores the integration of graph-based methods into Retrieval-Augmented Generation (RAG) systems to enhance efficiency, reduce hallucinations, and improve explainability, with a particular focus on financial and regulatory document retrieval. We propose two strategies—FactRAG and HybridRAG—which leverage knowledge graphs to improve RAG performance. Experiments conducted using Finance Bench, a benchmark for AI in finance, demonstrate that these approaches achieve a 6% reduction in hallucinations and an 80% decrease in token usage compared to conventional RAG methods. Furthermore, we evaluate HybridRAG by comparing the Digital Operational Resilience Act (DORA) from the European Union with the Federal Financial Institutions Examination Council (FFIEC) guidelines from the United States. The results reveal a significant improvement in computational efficiency, reducing contradiction detection complexity from O(n²) to O(k ⋅ n)—where n is the number of chunks—and a remarkable 734-fold decrease in token consumption. Graph-based retrieval methods can improve the efficiency and cost-effectiveness of large language model (LLM) applications, though their performance and token usage depend on the dataset, knowledge graph design, and retrieval task.

2024

Améliorer la traduction au niveau du document grâce au sur-echantillage négatif et au masquage ciblé
Gaëtan Caillaut | Mariam Nakhlé | Jingshu Liu | Raheel Qader
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Ces travaux visent à améliorer les capacités des systèmes de traduction automatique à tenir compte du contexte dans lequel se trouve la phrase source, et donc, ultimement, à améliorer les performances globales des systèmes de traduction automatique. L’approche que nous proposons repose uniquement sur les données et la manière dont elles sont fournies au modèle durant l’entraînement et est complètement agnostique de l’architecture du modèle. Nous montrons que les performances des modèles de traduction, sur la paire en-fr, peuvent être améliorées simplement en fournissant des données plus pertinentes vis-à-vis de la tâche cible, et ce sans modifier ni complexifier les architectures existantes, en particulier l’architecture Transformer couramment utilisée par les systèmes de TAL modernes. Pour ce faire, nous présentons deux stratégies d’augmentation de données (sur-échantillonnage négatif et masquage ciblé) conçues pour inciter le modèle à s’appuyer sur le contexte. Nous montrons, au travers de métriques appropriées, que ces méthodes permettent d’améliorer les performances des systèmes de traduction sans pour autant modifier ni l’architecture du modèle, ni le processus d’entraînement.

Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task
Gaëtan Caillaut | Mariam Nakhlé | Raheel Qader | Jingshu Liu | Jean-Gabriel Barthélemy
Proceedings of the Ninth Conference on Machine Translation

Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer architecture. As a consequence, scaling laws of encoder-decoder models for neural machine translation have already been well studied, but decoder-only models have received less attention.This work explores the scaling laws of decoder-only models on the multilingual and multidomain translation task. We trained a collection of six decoder-only models, ranging from 70M to 7B parameters, on a sentence-level, multilingual (8 languages) and multidomain (9 domains) dataset. We conducted a series of experiments showing that the loss of decoder-only models can be estimated using a scaling law similar to the one discovered for large language models, but we also show that this scaling law has difficulties to generalize to too large models or to a different data distribution. We also study different scaling methods and show that scaling the depth and the width of a model lead to similar test loss improvements, but with different impact on the model’s efficiency.

Exploring NMT Explainability for Translators Using NMT Visualising Tools
Gabriela Gonzalez-Saez | Mariam Nakhle | James Turner | Fabien Lopez | Nicolas Ballier | Marco Dinarelli | Emmanuelle Esperança-Rodier | Sui He | Raheel Qader | Caroline Rossi | Didier Schwab | Jun Yang
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

This paper describes work in progress on Visualisation tools to foster collaborations between translators and computational scientists. We aim to describe how visualisation features can be used to explain translation and NMT outputs. We tested several visualisation functionalities with three NMT models based on Chinese-English, Spanish-English and French-English language pairs. We created three demos containing different visualisation tools and analysed them within the framework of performance-explainability, focusing on the translator’s perspective.

2023

Large Language Model Adaptation for Financial Sentiment Analysis
Pau Rodriguez Inserte | Mariam Nakhlé | Raheel Qader | Gaetan Caillaut | Jingshu Liu
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

Natural language processing (NLP) has recently gained relevance within financial institutions by providing highly valuable insights into companies and markets’ financial documents. However, the landscape of the financial domain presents extra challenges for NLP, due to the complexity of the texts and the use of specific terminology. Generalist language models tend to fall short in tasks specifically tailored for finance, even when using large language models (LLMs) with great natural language understanding and generative capabilities. This paper presents a study on LLM adaptation methods targeted at the financial domain and with high emphasis on financial sentiment analysis. To this purpose, two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies. We show that through careful fine-tuning on both financial documents and instructions, these foundation models can be adapted to the target domain. Moreover, we observe that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data. In addition to the models, we show how to generate artificial instructions through LLMs to augment the number of samples of the instruction dataset.

2022

Lingua Custodia’s Participation at the WMT 2022 Word-Level Auto-completion Shared Task
Melissa Ailem | Jingshu Liu | Jean-gabriel Barthelemy | Raheel Qader
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents Lingua Custodia’s submission to the WMT22 shared task on Word Level Auto-completion (WLAC). We consider two directions, namely German-English and English-German.The WLAC task in Neural Machine Translation (NMT) consists in predicting a target word given few human typed characters, the source sentence to translate, as well as some translation context. Inspired by recent work in terminology control, we propose to treat the human typed sequence as a constraint to predict the right word starting by the latter. To do so, the source side of the training data is augmented with both the constraints and the translation context. In addition, following new advances in WLAC, we use a joint optimization strategy taking into account several types of translation context. The automatic as well as human accuracy obtained with the submitted systems show the effectiveness of the proposed method.

Encouraging Neural Machine Translation to Satisfy Terminology Constraints.
Melissa Ailem | Jingshu Liu | Raheel Qader
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Encouraging Neural Machine Translation to Satisfy Terminology Constraints. We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.

2021

Lingua Custodia’s Participation at the WMT 2021 Machine Translation Using Terminologies Shared Task
Melissa Ailem | Jingshu Liu | Raheel Qader
Proceedings of the Sixth Conference on Machine Translation

This paper describes Lingua Custodia’s submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.

Encouraging Neural Machine Translation to Satisfy Terminology Constraints
Melissa Ailem | Jingshu Liu | Raheel Qader
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

Controllable Neural Natural Language Generation: comparison of state-of-the-art control strategies
Yuanmin Leng | François Portet | Cyril Labbé | Raheel Qader
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

Most NLG systems target text fluency and grammatical correctness, disregarding control over text structure and length. However, control over the output plays an important part in industrial NLG applications. In this paper, we study different strategies of control in triple-totext generation systems particularly from the aspects of text structure and text length. Regarding text structure, we present an approach that relies on aligning the input entities with the facts in the target side. It makes sure that the order and the distribution of entities in both the input and the text are the same. As for control over text length, we show two different approaches. One is to supply length constraint as input while the other is to force the end-ofsentence tag to be included at each step when using top-k decoding strategy. Finally, we propose four metrics to assess the degree to which these methods will affect a NLG system’s ability to control text structure and length. Our analyses demonstrate that all the methods enhance the system’s ability with a slight decrease in text fluency. In addition, constraining length at the input level performs much better than control at decoding level.

Seq2SeqPy: A Lightweight and Customizable Toolkit for Neural Sequence-to-Sequence Modeling
Raheel Qader | François Portet | Cyril Labbe
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present Seq2SeqPy a lightweight toolkit for sequence-to-sequence modeling that prioritizes simplicity and ability to customize the standard architectures easily. The toolkit supports several known architectures such as Recurrent Neural Networks, Pointer Generator Networks, and transformer model. We evaluate the toolkit on two datasets and we show that the toolkit performs similarly or even better than a very widely used sequence-to-sequence toolkit.

2019

Fine-Grained Control of Sentence Segmentation and Entity Positioning in Neural NLG
Kritika Mehta | Raheel Qader | Cyril Labbe | François Portet
Proceedings of the 1st Workshop on Discourse Structure in Neural NLG

The move from pipeline Natural Language Generation (NLG) approaches to neural end-to-end approaches led to a loss of control in sentence planning operations owing to the conflation of intermediary micro-planning stages into a single model. Such control is highly necessary when the text should be tailored to respect some constraints such as which entity to be mentioned first, the entity position, the complexity of sentences, etc. In this paper, we introduce fine-grained control of sentence planning in neural data-to-text generation models at two levels - realization of input entities in desired sentences and realization of the input entities in the desired position among individual sentences. We show that by augmenting the input with explicit position identifiers, the neural model can achieve a great control over the output structure while keeping the naturalness of the generated text intact. Since sentence level metrics are not entirely suitable to evaluate this task, we used a metric specific to our task that accounts for the model’s ability to achieve control. The results demonstrate that the position identifiers do constraint the neural model to respect the intended output structure which can be useful in a variety of domains that require the generated text to be in a certain structure.

Semi-Supervised Neural Text Generation by Joint Learning of Natural Language Generation and Natural Language Understanding Models
Raheel Qader | François Portet | Cyril Labbé
Proceedings of the 12th International Conference on Natural Language Generation

In Natural Language Generation (NLG), End-to-End (E2E) systems trained through deep learning have recently gained a strong interest. Such deep models need a large amount of carefully annotated data to reach satisfactory performance. However, acquiring such datasets for every new NLG application is a tedious and time-consuming task. In this paper, we propose a semi-supervised deep learning scheme that can learn from non-annotated data and annotated data when available. It uses a NLG and a Natural Language Understanding (NLU) sequence-to-sequence models which are learned jointly to compensate for the lack of annotation. Experiments on two benchmark datasets show that, with limited amount of annotated data, the method can achieve very competitive results while not using any pre-processing or re-scoring tricks. These findings open the way to the exploitation of non-annotated datasets which is the current bottleneck for the E2E NLG system development to new applications.

2018

Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation
Raheel Qader | Khoder Jneid | François Portet | Cyril Labbé
Proceedings of the 11th International Conference on Natural Language Generation

In this paper we study the performance of several state-of-the-art sequence-to-sequence models applied to generation of short company descriptions. The models are evaluated on a newly created and publicly available company dataset that has been collected from Wikipedia. The dataset consists of around 51K company descriptions that can be used for both concept-to-text and text-to-text generation tasks. Automatic metrics and human evaluation scores computed on the generated company descriptions show promising results despite the difficulty of the task as the dataset (like most available datasets) has not been originally designed for machine learning. In addition, we perform correlation analysis between automatic metrics and human evaluations and show that certain automatic metrics are more correlated to human judgments.

2017

Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept (Automatic disfluency insertion towards spontaneous TTS : formalization and proof of concept)
Raheel Qader | Gwénolé Lecorvé | Damien Lolive | Pascale Sébillot
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Cet article présente un travail exploratoire sur l’ajout automatique de disfluences, c’est-à-dire de pauses, de répétitions et de révisions, dans les énoncés en entrée d’un système de synthèse de la parole. L’objectif est de conférer aux signaux ainsi synthétisés un caractère plus spontané et expressif. Pour cela, nous présentons une formalisation novatrice du processus de production de disfluences à travers un mécanisme de composition de ces disfluences. Cette formalisation se distingue notamment des approches visant la détection ou le nettoyage de disfluences dans des transcriptions, ou de celles en synthèse de la parole qui ne s’intéressent qu’au seul ajout de pauses. Nous présentons une première implémentation de notre processus fondée sur des champs aléatoires conditionnels et des modèles de langage, puis conduisons des évaluations objectives et perceptives. Celles-ci nous permettent de conclure à la fonctionnalité de notre proposition et d’en discuter les pistes principales d’amélioration.

2016

Adaptation de la prononciation pour la synthèse de la parole spontanée en utilisant des informations linguistiques (Pronunciation adaptation for spontaneous speech synthesis using linguistic information)
Raheel Qader | Gwénolé Lecorvé | Damien Lolive | Pascale Sébillot
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Cet article présente une nouvelle méthode d’adaptation de la prononciation dont le but est de reproduire le style spontané. Il s’agit d’une tâche-clé en synthèse de la parole car elle permet d’apporter de l’expressivité aux signaux produits, ouvrant ainsi la voie à de nouvelles applications. La force de la méthode proposée est de ne s’appuyer que sur des informations linguistiques et de considérer un cadre probabiliste pour ce faire, précisément les champs aléatoires conditionnels. Dans cet article, nous étudions tout d’abord la pertinence d’un ensemble d’informations pour l’adaptation, puis nous combinons les informations les plus pertinentes lors d’expériences finales. Les évaluations de la méthode sur un corpus de parole conversationnelle en anglais montrent que les prononciations adaptées reflètent significativement mieux un style spontané que les prononciations canoniques.