Alexis Nasr

2025

Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations
Hichem Ammar Khodja | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)

This paper explores the robustness of language models (LMs) to variations in the temporal context within factual knowledge. It examines whether LMs can correctly associate a temporal context with a past fact valid over a defined period, by asking them to differentiate correct from incorrect contexts. The LMs’ ability to distinguish is analyzed along two dimensions: the distance of the incorrect context from the validity period and the granularity of the context. To this end, a dataset called TimeStress is introduced, enabling the evaluation of 18 diverse LMs. Results reveal that the best LM achieves a perfect distinction for only 11% of the studied facts, with errors, certainly rare, but critical that humans would not make. This work highlights the limitations of current LMs in temporal representation.

pdf bib abs

Factual Knowledge Assessment of Language Models Using Distractors
Hichem Ammar Khodja | Abderrahmane Ait gueni ssaid | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 31st International Conference on Computational Linguistics

Language models encode extensive factual knowledge within their parameters. The accurate assessment of this knowledge is crucial for understanding and improving these models. In the literature, factual knowledge assessment often relies on cloze sentences, which can lead to erroneous conclusions due to the complexity of natural language (out-of-subject continuations, the existence of many correct answers and the several ways of expressing them). In this paper, we introduce a new interpretable knowledge assessment method that mitigates these issues by leveraging distractors—incorrect but plausible alternatives to the correct answer. We propose several strategies for retrieving distractors and determine the most effective one through experimentation. Our method is evaluated against existing approaches, demonstrating solid alignment with human judgment and stronger robustness to verbalization artifacts. The code and data to reproduce our experiments are available on GitHub.

pdf bib abs

Connaissances factuelles dans les modèles de langue : robustesse et anomalies face à des variations simples du contexte temporel
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Ce papier explore la robustesse des modèles de langue (ML) face aux variations du contexte temporel dans les connaissances factuelles. Il examine si les ML peuvent associer correctement un contexte temporel à un fait passé valide sur une période de temps délimitée, en leur demandant de différencier les contextes corrects des contextes incorrects. La capacité de distinction des ML est analysée sur deux dimensions : la distance du contexte incorrect par rapport à la période de validité et la granularité du contexte. Pour cela, un jeu de données, TimeStress, est introduit, permettant de tester 18 ML variés. Les résultats révèlent que le meilleur ML n’atteint une distinction parfaite que pour 11% des faits étudiés, avec des erreurs critiques qu’un humain ne ferait pas. Ces travaux soulignent les limites des ML actuels en matière de représentation temporelle.

pdf bib abs

Evaluating Pretrained Causal Language Models for Synonymy
Ioana Ivan | Carlos Ramisch | Alexis Nasr
Findings of the Association for Computational Linguistics: ACL 2025

The scaling of causal language models in size and training data enabled them to tackle increasingly complex tasks. Despite the development of sophisticated tests to reveal their new capabilities, the underlying basis of these complex skills remains unclear. We argue that complex skills might be explained using simpler ones, represented by linguistic concepts. As an initial step in exploring this hypothesis, we focus on the lexical-semantic concept of synonymy, laying the groundwork for research into its relationship with more complex skills. We develop a comprehensive test suite to assess various aspects of synonymy under different conditions, and evaluate causal open-source models ranging up to 10 billion parameters. We find that these models effectively recognize synonymy but struggle to generate synonyms when prompted with relevant context.

2024

pdf bib abs

WikiFactDiff: A Large, Realistic, and Temporally Adaptable Dataset for Atomic Factual Knowledge Update in Causal Language Models
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The factuality of large language model (LLMs) tends to decay over time since events posterior to their training are “unknown” to them. One way to keep models up-to-date could be factual update: the task of inserting, replacing, or removing certain simple (atomic) facts within the model. To study this task, we present WikiFactDiff, a dataset that describes the evolution of factual knowledge between two dates as a collection of simple facts divided into three categories: new, obsolete, and static. We describe several update scenarios arising from various combinations of these three types of basic update. The facts are represented by subject-relation-object triples; indeed, WikiFactDiff was constructed by comparing the state of the Wikidata knowledge base at 4 January 2021 and 27 February 2023. Those fact are accompanied by verbalization templates and cloze tests that enable running update algorithms and their evaluation metrics. Contrary to other datasets, such as zsRE and CounterFact, WikiFactDiff constitutes a realistic update setting that involves various update scenarios, including replacements, archival, and new entity insertions. We also present an evaluation of existing update algorithms on WikiFactDiff.

pdf bib abs

WikiFactDiff: Un Grand jeu de données Réaliste et Temporellement Adaptable pour la Mise à Jour Atomique des Connaissances Factuelles dans les Modèles de Langue Causaux
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecrové
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

La factualité des modèles de langue se dégrade avec le temps puisque les événements postérieurs à leur entraînement leur sont inconnus. Une façon de maintenir ces modèles à jour pourrait être la mise à jour factuelle à l’échelle de faits atomiques. Pour étudier cette tâche, nous présentons WikiFactDiff, un jeu de données qui représente les changements survenus entre deux dates sous la forme d’un ensemble de faits simples, sous format RDF, divisés en trois catégories : les faits à apprendre, les faits à conserver et les faits obsolètes. Ces faits sont verbalisés afin de permettre l’exécution des algorithmes de mise à jour et leur évaluation, qui est présentée dans ce document. Contrairement aux jeux de données existants, WikiFactDiff représente un cadre de mise à jour réaliste qui implique divers scénarios, notamment les remplacements de faits, leur archivage et l’insertion de nouvelles entités.

2023

pdf bib abs

Investigating the Effect of Relative Positional Embeddings on AMR-to-Text Generation with Structural Adapters
Sebastien Montella | Alexis Nasr | Johannes Heinecke | Frederic Bechet | Lina M. Rojas Barahona
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Text generation from Abstract Meaning Representation (AMR) has substantially benefited from the popularized Pretrained Language Models (PLMs). Myriad approaches have linearized the input graph as a sequence of tokens to fit the PLM tokenization requirements. Nevertheless, this transformation jeopardizes the structural integrity of the graph and is therefore detrimental to its resulting representation. To overcome this issue, Ribeiro et al. (2021b) have recently proposed StructAdapt, a structure-aware adapter which injects the input graph connectivity within PLMs using Graph Neural Networks (GNNs). In this paper, we investigate the influence of Relative Position Embeddings (RPE) on AMR-to-Text, and, in parallel, we examine the robustness of StructAdapt. Through ablation studies, graph attack and link prediction, we reveal that RPE might be partially encoding input graphs. We suggest further research regarding the role of RPE will provide valuable insights for Graph-to-Text generation.

2022

pdf bib abs

Transfer Learning and Masked Generation for Answer Verbalization
Sebastien Montella | Lina Rojas-Barahona | Frederic Bechet | Johannes Heinecke | Alexis Nasr
Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI)

Structured Knowledge has recently emerged as an essential component to support fine-grained Question Answering (QA). In general, QA systems query a Knowledge Base (KB) to detect and extract the raw answers as final prediction. However, as lacking of context, language generation can offer a much informative and complete response. In this paper, we propose to combine the power of transfer learning and the advantage of entity placeholders to produce high-quality verbalization of extracted answers from a KB. We claim that such approach is especially well-suited for answer generation. Our experiments show 44.25%, 3.26% and 29.10% relative gain in BLEU over the state-of-the-art on the VQuAnDA, ParaQA and VANiLLa datasets, respectively. We additionally provide minor hallucinations corrections in VANiLLa standing for 5% of each of the training and testing set. We witness a median absolute gain of 0.81 SacreBLEU. This strengthens the importance of data quality when using automated evaluation.

pdf bib abs

Dependency Parsing with Backtracking using Deep Reinforcement Learning
Franck Dary | Maxime Petit | Alexis Nasr
Transactions of the Association for Computational Linguistics, Volume 10

Greedy algorithms for NLP such as transition-based parsing are prone to error propagation. One way to overcome this problem is to allow the algorithm to backtrack and explore an alternative solution in cases where new evidence contradicts the solution explored so far. In order to implement such a behavior, we use reinforcement learning and let the algorithm backtrack in cases where such an action gets a better reward than continuing to explore the current solution. We test this idea on both POS tagging and dependency parsing and show that backtracking is an effective means to fight against error propagation.

2021

pdf bib abs

The Reading Machine: A Versatile Framework for Studying Incremental Parsing Strategies
Franck Dary | Alexis Nasr
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

The Reading Machine, is a parsing framework that takes as input raw text and performs six standard nlp tasks: tokenization, pos tagging, morphological analysis, lemmatization, dependency parsing and sentence segmentation. It is built upon Transition Based Parsing, and allows to implement a large number of parsing configurations, among which a fully incremental one. Three case studies are presented to highlight the versatility of the framework. The first one explores whether an incremental parser is able to take into account top-down dependencies (i.e. the influence of high level decisions on low level ones), the second compares the performances of an incremental and a pipe-line architecture and the third quantifies the impact of the right context on the predictions made by an incremental parser.

pdf bib abs

TALEP at CMCL 2021 Shared Task: Non Linear Combination of Low and High-Level Features for Predicting Eye-Tracking Data
Franck Dary | Alexis Nasr | Abdellah Fourtassi
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

In this paper we describe our contribution to the CMCL 2021 Shared Task, which consists in predicting 5 different eye tracking variables from English tokenized text. Our approach is based on a neural network that combines both raw textual features we extracted from the text and parser-based features that include linguistic predictions (e.g. part of speech) and complexity metrics (e.g., entropy of parsing). We found that both the features we considered as well as the architecture of the neural model that combined these features played a role in the overall performance. Our system achieved relatively high accuracy on the test data of the challenge and was ranked 2nd out of 13 competing teams and a total of 30 submissions.

2020

pdf bib abs

SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings
Cindy Aloui | Carlos Ramisch | Alexis Nasr | Lucie Barque
Proceedings of the 28th International Conference on Computational Linguistics

Contextualised embeddings such as BERT have become de facto state-of-the-art references in many NLP applications, thanks to their impressive performances. However, their opaqueness makes it hard to interpret their behaviour. SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses. We assess the model in a task of supersense tagging for French nouns. The little amount of supervision required makes it particularly well suited for low-resourced scenarios. Thanks to its interpretability, we perform linguistic analyses about the predicted supersenses in terms of input word and context representations.

2019

pdf bib abs

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the CALOR-Frame French corpus to develop the CALOR-QUEST resource presented in this paper.

pdf bib abs

Typological Features for Multilingual Delexicalised Dependency Parsing
Manon Scholivet | Franck Dary | Alexis Nasr | Benoit Favre | Carlos Ramisch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The existence of universal models to describe the syntax of languages has been debated for decades. The availability of resources such as the Universal Dependencies treebanks and the World Atlas of Language Structures make it possible to study the plausibility of universal grammar from the perspective of dependency parsing. Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. Our experiments on multilingual parsing for 40 languages show that typological information can indeed guide parsers to share information between similar languages beyond simple language identification.

pdf bib abs

CALOR-QUEST : un corpus d’entraînement et d’évaluation pour la compréhension automatique de textes (Machine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document)
Frederic Bechet | Cindy Aloui | Delphine Charlet | Geraldine Damnati | Johannes Heinecke | Alexis Nasr | Frederic Herledan
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

La compréhension automatique de texte est une tâche faisant partie de la famille des systèmes de Question/Réponse où les questions ne sont pas à portée générale mais sont liées à un document particulier. Récemment de très grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, réponse) ont été mis à la disposition de la communauté scientifique afin de développer des méthodes supervisées à base de réseaux de neurones profonds en obtenant des résultats prometteurs. Ces méthodes sont cependant très gourmandes en données d’apprentissage, données qui n’existent pour le moment que pour la langue anglaise. Le but de cette étude est de permettre le développement de telles ressources pour d’autres langues à moindre coût en proposant une méthode générant de manière semi-automatique des questions à partir d’une analyse sémantique d’un grand corpus. La collecte de questions naturelle est réduite à un ensemble de validation/test. L’application de cette méthode sur le corpus CALOR-Frame a permis de développer la ressource CALOR-QUEST présentée dans cet article.

Alexis Nasr

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2004

2002

2000

1998

1997

1995

Co-authors

Venues