2024
pdf
bib
abs
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Charlotte Siska
|
Katerina Marazopoulou
|
Melissa Ailem
|
James Bono
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Benchmarks have emerged as the central approach for evaluating Large Language Models (LLMs). The research community often relies on a model’s average performance across the test prompts of a benchmark to evaluate the model’s performance. This is consistent with the assumption that the test prompts within a benchmark represent a random sample from some real-world distribution of interest. We note that this is generally not the case; instead, we hold that the distribution of interest varies according to the specific use case. Hence, we analyze the robustness of LLM benchmarks to their underlying distributional assumptions. We find that (1) the correlation in model performance across test prompts is non-random, (2) accounting for correlations across test prompts can change model rankings on major benchmarks, (3) explanatory factors for these correlations include semantic similarity and common LLM failure points.
2022
pdf
bib
abs
Encouraging Neural Machine Translation to Satisfy Terminology Constraints.
Melissa Ailem
|
Jingshu Liu
|
Raheel Qader
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale
Encouraging Neural Machine Translation to Satisfy Terminology Constraints. We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.
pdf
bib
abs
Lingua Custodia’s Participation at the WMT 2022 Word-Level Auto-completion Shared Task
Melissa Ailem
|
Jingshu Liu
|
Jean-gabriel Barthelemy
|
Raheel Qader
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents Lingua Custodia’s submission to the WMT22 shared task on Word Level Auto-completion (WLAC). We consider two directions, namely German-English and English-German.The WLAC task in Neural Machine Translation (NMT) consists in predicting a target word given few human typed characters, the source sentence to translate, as well as some translation context. Inspired by recent work in terminology control, we propose to treat the human typed sequence as a constraint to predict the right word starting by the latter. To do so, the source side of the training data is augmented with both the constraints and the translation context. In addition, following new advances in WLAC, we use a joint optimization strategy taking into account several types of translation context. The automatic as well as human accuracy obtained with the submitted systems show the effectiveness of the proposed method.
2021
pdf
bib
abs
Lingua Custodia’s Participation at the WMT 2021 Machine Translation Using Terminologies Shared Task
Melissa Ailem
|
Jingshu Liu
|
Raheel Qader
Proceedings of the Sixth Conference on Machine Translation
This paper describes Lingua Custodia’s submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.
pdf
bib
Encouraging Neural Machine Translation to Satisfy Terminology Constraints
Melissa Ailem
|
Jingshu Liu
|
Raheel Qader
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2018
pdf
bib
abs
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
Melissa Ailem
|
Bowen Zhang
|
Aurelien Bellet
|
Pascal Denis
|
Fei Sha
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.