Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word Embeddings
Timothee Mickus | Kees Van Deemter | Mathieu Constant | Denis Paperno
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Word embeddings have advanced the state of the art in NLP across numerous tasks. Understanding the contents of dense neural representations is of utmost interest to the computational semantics community. We propose to focus on relating these opaque word vectors with human-readable definitions, as found in dictionaries This problem naturally divides into two subtasks: converting definitions into embeddings, and converting embeddings into definitions. This task was conducted in a multilingual setting, using comparable sets of embeddings trained homogeneously.

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus | Denis Paperno | Mathieu Constant
Transactions of the Association for Computational Linguistics, Volume 10

Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.


Génération automatique de définitions pour le français (Definition Modeling in French)
Timothee Mickus | Mathieu Constant | Denis Paperno
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

La génération de définitions est une tâche récente qui vise à produire des définitions lexicographiques à partir de plongements lexicaux. Nous remarquons deux lacunes : (i) l’état de l’art actuel ne s’est penché que sur l’anglais et le chinois, et (ii) l’utilisation escomptée en tant que méthode d’évaluation des plongements lexicaux doit encore être vérifiée. Pour y remédier, nous proposons un jeu de données pour la génération de définitions en français, ainsi qu’une évaluation des performances d’un modèle de génération de définitions simple selon les plongements lexicaux fournis en entrée.

What Meaning-Form Correlation Has to Compose With: A Study of MFC on Artificial and Natural Language
Timothee Mickus | Timothée Bernard | Denis Paperno
Proceedings of the 28th International Conference on Computational Linguistics

Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze meaning-form correlation on three sets of languages: (i) artificial toy languages tailored to be compositional, (ii) a set of English dictionary definitions, and (iii) a set of English sentences drawn from literature. We find that linguistic phenomena such as synonymy and ungrounded stop-words weigh on MFC measurements, and that straightforward methods to mitigate their effects have widely varying results depending on the dataset they are applied to. Data and code are made publicly available.

What do you mean, BERT?
Timothee Mickus | Denis Paperno | Mathieu Constant | Kees van Deemter
Proceedings of the Society for Computation in Linguistics 2020


Distributional Effects of Gender Contrasts Across Categories
Timothee Mickus | Olivier Bonami | Denis Paperno
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling
Timothee Mickus | Denis Paperno | Matthieu Constant
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

Defining words in a textual context is a useful task both for practical purposes and for gaining insight into distributed word representations. Building on the distributional hypothesis, we argue here that the most natural formalization of definition modeling is to treat it as a sequence-to-sequence task, rather than a word-to-sequence task: given an input sequence with a highlighted word, generate a contextually appropriate definition for it. We implement this approach in a Transformer-based sequence-to-sequence model. Our proposal allows to train contextualization and definition generation in an end-to-end fashion, which is a conceptual improvement over earlier works. We achieve state-of-the-art results both in contextual and non-contextual definition modeling.