Yves Lepage

2025

AnaScore: Understanding Semantic Parallelism in Proportional Analogies
Liyan Wang | Haotong Wang | Yves Lepage
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Formulaic criteria for proportional analogies, which capture relational mappings between two ratios of terms, are mainly confined to the formal level. As analogy datasets grow more complex, especially in evaluating the cognitive abilities of Large Language Models (LLMs), assessing parallelism in them becomes increasingly challenging and often requires human annotation. In this work, we propose AnaScore, an automatic metric for evaluating the strength of semantic parallelism in sentence analogies. AnaScore systematically provides formalized explanations for shared relational patterns at the level of conceptual knowledge. We apply AnaScore to annotate several existing datasets, considering different directions of the relations, and uncover artifacts in data construction. Our experiments with various LLMs demonstrate the efficacy of the AnaScore metric in capturing the inherent quality of analogical relationships, showing a positive correlation between analogy quality and model performance. Thanks to this metric, we clearly demonstrate that formally explainable examples are more beneficial for analogical reasoning, while ambiguous analogies with no clear criterion tend to hinder inference.

pdf bib abs

Q&A-LF : A French Question-Answering Benchmark for Measuring Fine-Grained Lexical Knowledge
Alexander Petrov | Alessandra Thais Mancas | Viviane Binet | Antoine Venant | Francois Lareau | Yves Lepage | Phillippe Langlais
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

We introduce Q&A-LF, a French, question-answering benchmark designed to assess the extent to which large language models capture fine-grained lexical knowledge. We investigate the ability of ChatGPT-4o mini, Qwen2.5-14B, Llama3.0-8B, and Llama3.1-8B to answer questions based on lexical functions from Meaning-Text Theory. Using various prompting setups with different levels of examples and context, we find that Qwen and ChatGPT generally outperform Llama models, achieving up to 70% accuracy, while Llama models reach just above 60%. We identify LFs that are particularly easy or especially challenging for the models. We further investigate whether providing sentence-level context and one-shot prompting improve performance, especially on semantically complex functions.

pdf bib abs

ALF : Un jeu de données d’analogies françaises à grain fin pour l’évaluation de la connaissance lexicale des grands modèles de langue
Alexander Petrov | Antoine Venant | François Lareau | Yves Lepage | Philippe Langlais
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

La révolution apportée par les grands modèles de langue (LLM) provient de l’étonnante fluidité des textes qu’ils génèrent. Cette fluidité soulève une question scientifique essentielle : quelle quantité de connaissance lexicale les LLM capturent-ils réellement afin de produire un langage aussi fluide? Pour y répondre, nous présentons ALF, un jeu de données analogiqes librement accessible et doté de riches informations lexicographiques fondées sur la théorie Sens-Texte. Il comprend 2600 analogies lexicales à grain fin avec lesquelles nous évaluons la capacité lexicale de quatre LLM standards : ChatGPT-4o mini ,Llama3.0-8B ,Llama3.1-8B etQwen2.5-14B . En moyenne, ChatGPT et la série Llama obtiennent une précision aux environs de 55%, tandis que Qwen est juste en dessous du seuil des 60%, ce qui montre qu’ALF pose un défi considérable. Nous identifions en outre certains types d’analogies et de méthodes d’invite qui révèlent des disparités de performance.

2024

pdf bib abs

Continued Pre-training on Sentence Analogies for Translation with Small Data
Liyan Wang | Haotong Wang | Yves Lepage
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper introduces Continued Pre-training on Analogies (CPoA) to incorporate pre-trained language models with analogical abilities, aiming at improving performance in low-resource translations without data augmentation. We continue training the models on sentence analogies retrieved from a translation corpus. Considering the sparsity of analogy in corpora, especially in low-resource scenarios, we propose exploring approximate analogies between sentences. We attempt to find sentence analogies that might not conform to formal criteria for entire sentences but partial pieces. When training the models, we introduce a weighting scalar pertaining to the quality of analogies to adjust the influence: emphasizing closer analogies while diminishing the impact of far ones. We evaluate our approach on a low-resource translation task: German-Upper Sorbian. The results show that CPoA using 10 times fewer instances can effectively attain gains of +1.4 and +1.3 BLEU points over the original model in two translation directions. This improvement is more pronounced when there are fewer parallel examples.

Yves Lepage

2025

2024

2023

2022

2021

2020

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2001

2000

1998

1996

1994

Co-authors

Venues