Isabelle Gribomont


2024

pdf bib
Exploring hybrid approaches to readability: experiments on the complementarity between linguistic features and transformers
Rodrigo Wilkens | Patrick Watrin | Rémi Cardon | Alice Pintard | Isabelle Gribomont | Thomas François
Findings of the Association for Computational Linguistics: EACL 2024

Linguistic features have a strong contribution in the context of the automatic assessment of text readability (ARA). They have been one of the anchors between the computational and theoretical models. With the development in the ARA field, the research moved to Deep Learning (DL). In an attempt to reconcile the mixed results reported in this context, we present a systematic comparison of 6 hybrid approaches along with standard Machine Learning and DL approaches, on 4 corpora (different languages and target audiences). The various experiments clearly highlighted two rather simple hybridization methods (soft label and simple concatenation). They also appear to be the most robust on smaller datasets and across various tasks and languages. This study stands out as the first to systematically compare different architectures and approaches to feature hybridization in DL, as well as comparing performance in terms of two languages and two target audiences of the text, which leads to a clearer pattern of results.

pdf bib
Exploration d’approches hybrides pour la lisibilité : expériences sur la complémentarité entre les traits linguistiques et les transformers
Rodrigo Wilkens | Patrick Watrin | Rémi Cardon | Alice Pintard | Isabelle Gribomont | Thomas François
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 2 : traductions d'articles publiès

Les architectures d’apprentissage automatique reposant sur la définition de traits linguistiques ont connu un succès important dans le domaine de l’évaluation automatique de la lisibilité des textes (ARA) et ont permis de faire se rencontrer informatique et théorie psycholinguistique. Toutefois, les récents développements se sont tournés vers l’apprentissage profond et les réseaux de neurones. Dans cet article, nous cherchons à réconcilier les deux approches. Nous présentons une comparaison systématique de 6 architectures hybrides (appliquées à plusieurs langues et publics) que nous comparons à ces deux approches concurrentes. Les diverses expériences réalisées ont clairement mis en évidence deux méthodes d’hybridation : Soft-Labeling et concaténation simple. Ces deux architectures sont également plus efficaces lorsque les données d’entraînement sont réduites. Cette étude est la première à comparer systématiquement différentes architectures hybrides et à étudier leurs performances dans plusieurs tâches de lisibilité.

2023

pdf bib
From Diachronic to Contextual Lexical Semantic Change: Introducing Semantic Difference Keywords (SDKs) for Discourse Studies
Isabelle Gribomont
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

This paper introduces the concept of Semantic Difference Keywords (SDKs). We define SDKs as keywords selected because of a comparatively high semantic difference between their use in two or more corpora. They are extracted by applying methods developed to identify diachronic Lexical Semantic Change. Like statistical keywords, most commonly used in quantitative discourse studies, SDKs capture the distinctiveness of a target corpus. However, they do not do so because they are used significantly more often or more consistently, but because they are used significantly differently. The case study presented in this paper shows that SDKs are successful in identifying concepts which are contested, i.e., sites of “semantic struggles” (CITATION). SDKs are therefore a useful contribution to (computational) discourse studies and text-based Digital Humanities more broadly.

2022

pdf bib
CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
Rodrigo Wilkens | David Alfter | Rémi Cardon | Isabelle Gribomont | Adrien Bibal | Watrin Patrick | Marie-Catherine De marneffe | Thomas François
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Lexical simplification is the task of substituting a difficult word with a simpler equivalent for a target audience. This is currently commonly done by modeling lexical complexity on a continuous scale to identify simpler alternatives to difficult words. In the TSAR shared task, the organizers call for systems capable of generating substitutions in a zero-shot-task context, for English, Spanish and Portuguese. In this paper, we present the solution we (the cental team) proposed for the task. We explore the ability of BERT-like models to generate substitution words by masking the difficult word. To do so, we investigate various context enhancement strategies, that we combined into an ensemble method. We also explore different substitution ranking methods. We report on a post-submission analysis of the results and present our insights for potential improvements. The code for all our experiments is available at https://gitlab.com/Cental-FR/cental-tsar2022.