Gabriel Bernier-Colborne - ACL Anthology

Gabriel Bernier-Colborne

Also published as: Gabriel Bernier-colborne

2025

Test Set Quality in Multilingual LLM Evaluation
Chalamalasetti Kranti | Gabriel Bernier-Colborne | Yvan Gauthier | Sowmya Vajjala
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems

Several multilingual benchmark datasets have been developed in a semi-automatic manner in the recent past to measure progress and understand the state-of-the-art in the multilingual capabilities of Large Language Models (LLM). However, there is not a lot of attention paid to the quality of the datasets themselves, despite the existence of previous work in identifying errors in even fully human-annotated test sets. In this paper, we manually analyze recent multilingual evaluation sets in two languages – French and Telugu, identifying several errors in the datasets during the process. We compare the performance difference across several LLMs with the original and revised versions of the datasets and identify large differences (almost 10% in some cases) in both languages. Based on these results, we argue that test sets should not be considered immutable and should be revisited, checked for correctness, and potentially versioned. We end with some recommendations for both the dataset creators as well as consumers on addressing the dataset quality issues.

2024

Some Tradeoffs in Continual Learning for Parliamentary Neural Machine Translation Systems
Rebecca Knowles | Samuel Larkin | Michel Simard | Marc A Tessier | Gabriel Bernier-Colborne | Cyril Goutte | Chi-kiu Lo
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

In long-term translation projects, like Parliamentary text, there is a desire to build machine translation systems that can adapt to changes over time. We implement and examine a simple approach to continual learning for neural machine translation, exploring tradeoffs between consistency, the model’s ability to learn from incoming data, and the time a client would need to wait to obtain a newly trained translation system.

Methods, Applications, and Directions of Learning-to-Rank in NLP Research
Justin Lee | Gabriel Bernier-Colborne | Tegan Maharaj | Sowmya Vajjala
Findings of the Association for Computational Linguistics: NAACL 2024

Learning-to-rank (LTR) algorithms aim to order a set of items according to some criteria. They are at the core of applications such as web search and social media recommendations, and are an area of rapidly increasing interest, with the rise of large language models (LLMs) and the widespread impact of these technologies on society. In this paper, we survey the diverse use cases of LTR methods in natural language processing (NLP) research, looking at previously under-studied aspects such as multilingualism in LTR applications and statistical significance testing for LTR problems. We also consider how large language models are changing the LTR landscape. This survey is aimed at NLP researchers and practitioners interested in understanding the formalisms and best practices regarding the application of LTR approaches in their research.

2023

Dialect and Variant Identification as a Multi-Label Classification Task: A Proposal Based on Near-Duplicate Analysis
Gabriel Bernier-colborne | Cyril Goutte | Serge Leger
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

We argue that dialect identification should be treated as a multi-label classification problem rather than the single-class setting prevalent in existing collections and evaluations. In order to avoid extensive human re-labelling of the data, we propose an analysis of ambiguous near-duplicates in an existing collection covering four variants of French.We show how this analysis helps us provide multiple labels for a significant subset of the original data, therefore enriching the annotation with minimal human intervention. The resulting data can then be used to train dialect identifiers in a multi-label setting. Experimental results show that on the enriched dataset, the multi-label classifier produces similar accuracy to the single-label classifier on test cases that are unambiguous (single label), but it increases the macro-averaged F1-score by 0.225 absolute (71% relative gain) on ambiguous texts with multiple labels. On the original data, gains on the ambiguous test cases are smaller but still considerable (+0.077 absolute, 20% relative gain), and accuracy on non-ambiguous test cases is again similar in this case. This supports our thesis that modelling dialect identification as a multi-label problem potentially has a positive impact.

2022

Transfer Learning Improves French Cross-Domain Dialect Identification: NRC @ VarDial 2022
Gabriel Bernier-Colborne | Serge Leger | Cyril Goutte
Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects

We describe the systems developed by the National Research Council Canada for the French Cross-Domain Dialect Identification shared task at the 2022 VarDial evaluation campaign. We evaluated two different approaches to this task: SVM and probabilistic classifiers exploiting n-grams as features, and trained from scratch on the data provided; and a pre-trained French language model, CamemBERT, that we fine-tuned on the dialect identification task. The latter method turned out to improve the macro-F1 score on the test set from 0.344 to 0.430 (25% increase), which indicates that transfer learning can be helpful for dialect identification.

Refining an Almost Clean Translation Memory Helps Machine Translation
Shivendra Bhardwa | David Alfonso-Hermelo | Philippe Langlais | Gabriel Bernier-Colborne | Cyril Goutte | Michel Simard
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

While recent studies have been dedicated to cleaning very noisy parallel corpora to improve Machine Translation training, we focus in this work on filtering a large and mostly clean Translation Memory. This problem of practical interest has not received much consideration from the community, in contrast with, for example, filtering large web-mined parallel corpora. We experiment with an extensive, multi-domain proprietary Translation Memory and compare five approaches involving deep-, feature-, and heuristic-based solutions. We propose two ways of evaluating this task, manual annotation and resulting Machine Translation quality. We report significant gains over a state-of-the-art, off-the-shelf cleaning system, using two MT engines.

2021

N-gram and Neural Models for Uralic Language Identification: NRC at VarDial 2021
Gabriel Bernier-Colborne | Serge Leger | Cyril Goutte
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

We describe the systems developed by the National Research Council Canada for the Uralic language identification shared task at the 2021 VarDial evaluation campaign. We evaluated two different approaches to this task: a probabilistic classifier exploiting only character 5-grams as features, and a character-based neural network pre-trained through self-supervision, then fine-tuned on the language identification task. The former method turned out to perform better, which casts doubt on the usefulness of deep learning methods for language identification, where they have yet to convincingly and consistently outperform simpler and less costly classification algorithms exploiting n-gram features.

2020

Challenges in Neural Language Identification: NRC at VarDial 2020
Gabriel Bernier-Colborne | Cyril Goutte
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

We describe the systems developed by the National Research Council Canada for the Uralic language identification shared task at the 2020 VarDial evaluation campaign. Although our official results were well below the baseline, we show in this paper that this was not due to the neural approach to language identification in general, but to a flaw in the function we used to sample data for training and evaluation purposes. Preliminary experiments conducted after the evaluation period suggest that our neural approach to language identification can achieve state-of-the-art results on this task, although further experimentation is required.

Human or Neural Translation?
Shivendra Bhardwaj | David Alfonso Hermelo | Phillippe Langlais | Gabriel Bernier-Colborne | Cyril Goutte | Michel Simard
Proceedings of the 28th International Conference on Computational Linguistics

Deep neural models tremendously improved machine translation. In this context, we investigate whether distinguishing machine from human translations is still feasible. We trained and applied 18 classifiers under two settings: a monolingual task, in which the classifier only looks at the translation; and a bilingual task, in which the source text is also taken into consideration. We report on extensive experiments involving 4 neural MT systems (Google Translate, DeepL, as well as two systems we trained) and varying the domain of texts. We show that the bilingual task is the easiest one and that transfer-based deep-learning classifiers perform best, with mean accuracies around 85% in-domain and 75% out-of-domain .

HardEval: Focusing on Challenging Tokens to Assess Robustness of NER
Gabriel Bernier-Colborne | Phillippe Langlais
Proceedings of the Twelfth Language Resources and Evaluation Conference

To assess the robustness of NER systems, we propose an evaluation method that focuses on subsets of tokens that represent specific sources of errors: unknown words and label shift or ambiguity. These subsets provide a system-agnostic basis for evaluating specific sources of NER errors and assessing room for improvement in terms of robustness. We analyze these subsets of challenging tokens in two widely-used NER benchmarks, then exploit them to evaluate NER systems in both in-domain and out-of-domain settings. Results show that these challenging tokens explain the majority of errors made by modern NER systems, although they represent only a small fraction of test tokens. They also indicate that label shift is harder to deal with than unknown words, and that there is much more room for improvement than the standard NER evaluation procedure would suggest. We hope this work will encourage NLP researchers to adopt rigorous and meaningful evaluation methods, and will help them develop more robust models.

2019

Improving Cuneiform Language Identification with BERT
Gabriel Bernier-Colborne | Cyril Goutte | Serge Léger
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural network achieved 77% accuracy on the test data, which turned out to be the best performance at the CLI evaluation, establishing a new state-of-the-art for cuneiform language identification.

NRC Parallel Corpus Filtering System for WMT 2019
Gabriel Bernier-Colborne | Chi-kiu Lo
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We describe the National Research Council Canada team’s submissions to the parallel corpus filtering task at the Fourth Conference on Machine Translation.

2018

CRIM at SemEval-2018 Task 9: A Hybrid Approach to Hypernym Discovery
Gabriel Bernier-Colborne | Caroline Barrière
Proceedings of the 12th International Workshop on Semantic Evaluation

This report describes the system developed by the CRIM team for the hypernym discovery task at SemEval 2018. This system exploits a combination of supervised projection learning and unsupervised pattern-based hypernym discovery. It was ranked first on the 3 sub-tasks for which we submitted results.

2017

Fine-grained domain classification of text using TERMIUM Plus
Gabriel Bernier-Colborne | Caroline Barrière | Pierre André Ménard
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

2016

Combiner des modèles sémantiques distributionnels pour mieux détecter les termes évoquant le même cadre sémantique (Combining distributional semantic models to improve the identification of terms that evoke the same semantic frame)
Gabriel Bernier-Colborne | Patrick Drouin
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

Nous utilisons des modèles sémantiques distributionnels pour détecter des termes qui évoquent le même cadre sémantique. Dans cet article, nous vérifions si une combinaison de différents modèles permet d’obtenir une précision plus élevée qu’un modèle unique. Nous mettons à l’épreuve plusieurs méthodes simples pour combiner les mesures de similarité calculées à partir de chaque modèle. Les résultats indiquent qu’on obtient systématiquement une augmentation de la précision par rapport au meilleur modèle unique en combinant des modèles différents.

Évaluation des modèles sémantiques distributionnels : le cas de la dérivation syntaxique (Evaluation of distributional semantic models : The case of syntactic derivation )
Gabriel Bernier-Colborne | Patrick Drouin
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)

Nous évaluons deux modèles sémantiques distributionnels au moyen d’un jeu de données représentant quatre types de relations lexicales et analysons l’influence des paramètres des deux modèles. Les résultats indiquent que le modèle qui offre les meilleurs résultats dépend des relations ciblées, et que l’influence des paramètres des deux modèles varie considérablement en fonction de ce facteur. Ils montrent également que ces modèles captent aussi bien la dérivation syntaxique que la synonymie, mais que les configurations qui captent le mieux ces deux types de relations sont très différentes.

Evaluation of distributional semantic models: a holistic approach
Gabriel Bernier-Colborne | Patrick Drouin
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

We investigate how both model-related factors and application-related factors affect the accuracy of distributional semantic models (DSMs) in the context of specialized lexicography, and how these factors interact. This holistic approach to the evaluation of DSMs provides valuable guidelines for the use of these models and insight into the kind of semantic information they capture.

2015

La séparation des composantes lexicale et flexionnelle des vecteurs de mots
François Lareau | Gabriel Bernier-Colborne | Patrick Drouin
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

En sémantique distributionnelle, le sens des mots est modélisé par des vecteurs qui représentent leur distribution en corpus. Les modèles étant souvent calculés sur des corpus sans pré-traitement linguistique poussé, ils ne permettent pas de rendre bien compte de la compositionnalité morphologique des mots-formes. Nous proposons une méthode pour décomposer les vecteurs de mots en vecteurs lexicaux et flexionnels.

Exploration de modèles distributionnels au moyen de graphes 1-PPV
Gabriel Bernier-Colborne
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous montrons qu’un graphe à 1 plus proche voisin (graphe 1-PPV) offre différents moyens d’explorer les voisinages sémantiques captés par un modèle distributionnel. Nous vérifions si les composantes connexes de ce graphe, qui représentent des ensembles de mots apparaissant dans des contextes similaires, permettent d’identifier des ensembles d’unités lexicales qui évoquent un même cadre sémantique. Nous illustrons également différentes façons d’exploiter le graphe 1-PPV afin d’explorer un modèle ou de comparer différents modèles.

2014

Identifying semantic relations in a specialized corpus through distributional analysis of a cooccurrence tensor
Gabriel Bernier-Colborne
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

Extracting lexico-semantic relations from specialized corpora using a word space model (Analyse distributionnelle de corpus spécialisés pour l’identification de relations lexico-sémantiques) [in French]
Gabriel Bernier-Colborne
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

2012

Application d’un algorithme de traduction statistique à la normalisation de textos (Applying a Statistical Machine Translation Algorithm to SMS Text Message Normalization) [in French]
Gabriel Bernier-Colborne
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 3: RECITAL

Venues