Amal Zouaq

2025

pdf bib abs
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar | Quentin Fournier | Matthew Riemer | Pin-Yu Chen | Amal Zouaq | Payel Das | Sarath Chandar
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruction-tuned counterparts. However, these expert models are not either explicitly trained to be safe, or experience a loss in their safety abilities in the process, making them capable of generating harmful content. We observe that simple interpolation between the domain and alignment delta parameters leads to safer domain-specific models that preserve their utility. Building on this, we introduce MergeAlign, a simple, efficient, and effective model merging-based alignment method. We apply MergeAlign on Llama3 models that are experts in medicine and finance, obtaining substantial safety alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged, as well as the applicability of MergeAlign on more general code and math expert models using the Qwen-2.5 series of models. We hope our findings open new research avenues towards efficient development and deployment of safe expert LLMs.

pdf bib abs
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma | Aman Dalmia | Mehran Kazemi | Amal Zouaq | Christopher Pal
Findings of the Association for Computational Linguistics: NAACL 2025

Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training, such as calculating the cosine of an arbitrary angle, and by difficulties in correctly applying relevant geometry formulas. To overcome these challenges, we present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library. By executing the code, we achieve accurate and deterministic calculations, contrasting the stochastic nature of autoregressive token prediction, while the function library minimizes errors in formula usage. We also propose a multimodal retrieval-augmented variant of GeoCoder, named RAG-GeoCoder, which incorporates a non-parametric memory module for retrieving functions from the geometry library, thereby reducing reliance on parametric memory. Our modular code-finetuning approach enhances the geometric reasoning capabilities of VLMs, yielding an average improvement of over 16% across various question complexities on the GeomVerse dataset compared to other fine-tuning methods.

2024

pdf bib abs
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar | Quentin Fournier | Matthew Riemer | Pin-Yu Chen | Amal Zouaq | Payel Das | Sarath Chandar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

pdf bib abs
TagDebias: Entity and Concept Tagging for Social Bias Mitigation in Pretrained Language Models
Mehrnaz Moslemi | Amal Zouaq
Findings of the Association for Computational Linguistics: NAACL 2024

Pre-trained language models (PLMs) play a crucial role in various applications, including sensitive domains such as the hiring process. However, extensive research has unveiled that these models tend to replicate social biases present in their pre-training data, raising ethical concerns. In this study, we propose the TagDebias method, which proposes debiasing a dataset using type tags. It then proceeds to fine-tune PLMs on this debiased dataset. Experiments show that our proposed TagDebias model, when applied to a ranking task, exhibits significant improvements in bias scores.

pdf bib abs
MVP: Minimal Viable Phrase for Long Text Understanding
Louis Clouatre | Amal Zouaq | Sarath Chandar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

A recent renewal in interest in long text understanding has sparked the emergence of high-quality long text benchmarks, as well as new models demonstrating significant performance improvements on these benchmarks. However, gauging the implication of these advancements based solely on the length of the input text offers limited insight. Such benchmarks may require models to parse long-range dependencies or merely to locate and comprehend the relevant paragraph within a longer text. This work introduces the Minimal Viable Phrase (MVP), a novel metric that determines, through perturbations to the input text, the shortest average text length that needs to be preserved to execute the task with limited performance degradation. Our evaluation of the popular SCROLLS benchmark reveals that only one of its seven tasks necessitates an MVP of over 512 tokens–the maximum text length manageable by the previous generation of pre-trained models. We highlight the limited need for understanding long-range dependencies in resolving these tasks, discuss the specific design decisions that seem to have led to the QuALITY task requiring reliance on long-range dependencies to be solved, and point out specific modeling choices that seem to outperform on the QuALITY task.

2022

pdf bib abs
Local Structure Matters Most in Most Languages
Louis Clouatre | Prasanna Parthasarathi | Amal Zouaq | Sarath Chandar
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Many recent perturbation studies have found unintuitive results on what does and does not matter when performing Natural Language Understanding (NLU) tasks in English. Coding properties, such as the order of words, can often be removed through shuffling without impacting downstream performances. Such insight may be used to direct future research into English NLP models. As many improvements in multilingual settings consist of wholesale adaptation of English approaches, it is important to verify whether those studies replicate or not in multilingual settings. In this work, we replicate a study on the importance of local structure, and the relative unimportance of global structure, in a multilingual setting. We find that the phenomenon observed on the English language broadly translates to over 120 languages, with a few caveats.

pdf bib abs
Local Structure Matters Most: Perturbation Study in NLU
Louis Clouatre | Prasanna Parthasarathi | Amal Zouaq | Sarath Chandar
Findings of the Association for Computational Linguistics: ACL 2022

Recent research analyzing the sensitivity of natural language understanding models to word-order perturbations has shown that neural models are surprisingly insensitive to the order of words. In this paper, we investigate this phenomenon by developing order-altering perturbations on the order of words, subwords, and characters to analyze their effect on neural models’ performance on language understanding tasks. We experiment with measuring the impact of perturbations to the local neighborhood of characters and global position of characters in the perturbed texts and observe that perturbation functions found in prior literature only affect the global ordering while the local ordering remains relatively unperturbed. We empirically show that neural models, invariant of their inductive biases, pretraining scheme, or the choice of tokenization, mostly rely on the local structure of text to build understanding and make limited use of the global structure.

pdf bib abs
A Copy Mechanism for Handling Knowledge Base Elements in SPARQL Neural Machine Translation
Rose Hirigoyen | Amal Zouaq | Samuel Reyd
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Neural Machine Translation (NMT) models from English to SPARQL are a promising development for SPARQL query generation. However, current architectures are unable to integrate the knowledge base (KB) schema and handle questions on knowledge resources, classes, and properties unseen during training, rendering them unusable outside the scope of topics covered in the training set. Inspired by the performance gains in natural language processing tasks, we propose to integrate a copy mechanism for neural SPARQL query generation as a way to tackle this issue. We illustrate our proposal by adding a copy layer and a dynamic knowledge base vocabulary to two Seq2Seq architectures (CNNs and Transformers). This layer makes the models copy KB elements directly from the questions, instead of generating them. We evaluate our approach on state-of-the-art datasets, including datasets referencing unknown KB elements and measure the accuracy of the copy-augmented architectures. Our results show a considerable increase in performance on all datasets compared to non-copy architectures.

pdf bib abs
Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes
Louis Clouatre | Prasanna Parthasarathi | Amal Zouaq | Sarath Chandar
Findings of the Association for Computational Linguistics: EMNLP 2022

Providing better language tools for low-resource and endangered languages is imperative for equitable growth.Recent progress with massively multilingual pretrained models has proven surprisingly effective at performing zero-shot transfer to a wide variety of languages.However, this transfer is not universal, with many languages not currently understood by multilingual approaches.It is estimated that only 72 languages possess a “small set of labeled datasets” on which we could test a model’s performance, the vast majority of languages not having the resources available to simply evaluate performances on.In this work, we attempt to clarify which languages do and do not currently benefit from such transfer.To that end, we develop a general approach that requires only unlabelled text to detect which languages are not well understood by a cross-lingual model.Our approach is derived from the hypothesis that if a model’s understanding is insensitive to perturbations to text in a language, it is likely to have a limited understanding of that language.We construct a cross-lingual sentence similarity task to evaluate our approach empirically on 350, primarily low-resource, languages.

2021

pdf bib
MLMLM: Link Prediction with Mean Likelihood Masked Language Model
Louis Clouatre | Philippe Trempe | Amal Zouaq | Sarath Chandar
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib abs
Ontology Matching Using Convolutional Neural Networks
Alexandre Bento | Amal Zouaq | Michel Gagnon
Proceedings of the Twelfth Language Resources and Evaluation Conference

In order to achieve interoperability of information in the context of the Semantic Web, it is necessary to find effective ways to align different ontologies. As the number of ontologies grows for a given domain, and as overlap between ontologies grows proportionally, it is becoming more and more crucial to develop accurate and reliable techniques to perform this task automatically. While traditional approaches to address this challenge are based on string metrics and structure analysis, in this paper we present a methodology to align ontologies automatically using machine learning techniques. Specifically, we use convolutional neural networks to perform string matching between class labels using character embeddings. We also rely on the set of superclasses to perform the best alignment. Our results show that we obtain state-of-the-art performance on ontologies from the Ontology Alignment Evaluation Initiative (OAEI). Our model also maintains good performance when tested on a different domain, which could lead to potential cross-domain applications.

2010

pdf bib abs
Can Syntactic and Logical Graphs help Word Sense Disambiguation?
Amal Zouaq | Michel Gagnon | Benoit Ozell
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a word sense disambiguation (WSD) approach based on syntactic and logical representations. The objective here is to run a number of experiments to compare standard contexts (word windows, sentence windows) with contexts provided by a dependency parser (syntactic context) and a logical analyzer (logico-semantic context). The approach presented here relies on a dependency grammar for the syntactic representations. We also use a pattern knowledge base over the syntactic dependencies to extract flat predicative logical representations. These representations (syntactic and logical) are then used to build context vectors that are exploited in the WSD process. Various state-of-the-art algorithms including Simplified Lesk, Banerjee and Pedersen and frequency of co-occurrences are tested with these syntactic and logical contexts. Preliminary results show that defining context vectors based on these features may improve WSD by comparison with classical word and sentence context windows. However, future experiments are needed to provide more evidence over these issues.