Denis Paperno

2025

pdf bib abs
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song | Yupei Du | Denis Paperno | Albert Gatt
Findings of the Association for Computational Linguistics: ACL 2025

This paper introduces the TempVS benchmark, which focuses on temporal grounding and reasoning capabilities of Multimodal Large Language Models (MLLMs) in image sequences. TempVS consists of three main tests (i.e., event relation inference, sentence ordering and image ordering), each accompanied with a basic grounding test. TempVS requires MLLMs to rely on both visual and linguistic modalities to understand the temporal order of events. We evaluate 38 state-of-the-art MLLMs, demonstrating that models struggle to solve TempVS, with a substantial performance gap compared to human capabilities. We also provide fine-grained insights that suggest promising directions for future research. Our TempVS benchmark data and code are available at https://github.com/yjsong22/TempVS.

pdf bib abs
Hypernetworks for Perspectivist Adaptation
Daniil Ignatev | Denis Paperno | Massimo Poesio
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

The task of perspective-aware classification introduces a bottleneck in terms of parametric efficiency that did not get enough recognition in existing studies. In this article, we aim to address this issue by applying an existing architecture, the hypernetwork+adapters combination, to perspectivist classification. Ultimately, we arrive at a solution that can compete with specialized models in adopting user perspectives on hate speech and toxicity detection, while also making use of considerably fewer parameters. Our solution is architecture-agnostic and can be applied to a wide range of base models out of the box.

pdf bib abs
Empathy vs Neutrality: Designing and Evaluating a Natural Chatbot for the Healthcare Domain
Cristina Reguera-Gómez | Denis Paperno | Maaike H. T. de Boer
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

As lifestyle-related diseases rise due to unhealthy habits such as smoking, poor diet, lack of exercise, and alcohol consumption, the role of Conversational AI in healthcare is increasingly significant. This study provides an empirical study on the design and evaluation of a natural and intuitive healthcare chatbot, specifically focusing on the impact of empathetic responses on user experience regarding lifestyle changes. Findings reveal a strong preference for the empathetic chatbot, with results showing statistical significance (p <0.001), highlighting the importance of empathy in enhancing user interaction with healthcare chatbots.

pdf bib
Annotator disagreement in RST annotation schemes
Daniil Ignatev | Denis Paperno | Massimo Poesio
Proceedings of the Society for Computation in Linguistics 2025

pdf bib abs
West Germanic noun-noun compounds and the morphology-syntax trade-off
Pablo Mosteiro | Damián Blasi | Denis Paperno
Proceedings of the 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics

This paper examines the linguistic distinction between syntax and morphology, focusing on noun-noun compounds in three West Germanic languages (English, Dutch, and German). Previous studies using the Parallel Bible Corpus have found a trade-off between word order (syntax) and word structure (morphology), with languages optimizing information conveyance through these systems. Our research question is whether manipulating English noun-noun compounds to resemble Dutch and German constructions can reproduce the observed distance between these languages in the order-structure plane. We extend a word-pasting procedure to merge increasingly common noun-noun pairs in English Bible translations. After each merge, we estimate the information contained in word order and word structure using entropy calculations. Our results show that pasting noun-noun pairs reduces the difference between English and the other languages, suggesting that orthographic conventions defining word boundaries play a role in this distinction. However, the effect is not pronounced, and results are statistically inconclusive.

2024

pdf bib abs
Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning
Yingjin Song | Denis Paperno | Albert Gatt
Proceedings of the 17th International Natural Language Generation Conference

Visual storytelling systems generate multi-sentence stories from image sequences. In this task, capturing contextual information and bridging visual variation bring additional challenges. We propose a simple yet effective framework that leverages the generalization capabilities of pretrained foundation models, only training a lightweight vision-language mapping network to connect modalities, while incorporating context to enhance coherence. We introduce a multimodal contrastive objective that also improves visual relevance and story informativeness. Extensive experimental results, across both automatic metrics and human evaluations, demonstrate that the stories generated by our framework are diverse, coherent, informative, and interesting.

2023

pdf bib abs
Grounded and well-rounded: a methodological approach to the study of cross-modal and cross-lingual grounding
Timothee Mickus | Elaine Zosa | Denis Paperno
Findings of the Association for Computational Linguistics: EMNLP 2023

Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems. Literature has divided into two camps: While some argue that grounding allows for qualitatively different generalizations, others believe it can be compensated by mono-modal data quantity. Limited empirical evidence has emerged for or against either position, which we argue is due to the methodological challenges that come with studying grounding and its effects on NLP systems. In this paper, we establish a methodological framework for studying what the effects are—if any—of providing models with richer input sources than text-only. The crux of it lies in the construction of comparable samples of populations of models trained on different input modalities, so that we can tease apart the qualitative effects of different input sources from quantifiable model performances. Experiments using this framework reveal qualitative differences in model behavior between cross-modally grounded, cross-lingually grounded, and ungrounded models, which we measure both at a global dataset level as well as for specific word representations, depending on how concrete their semantics is.

pdf bib
The Scenario Refiner: Grounding subjects in images at the morphological level
Claudia C. Tagliaferri | Denis Paperno | Albert Gatt | Sofia Axioti
Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing

pdf bib abs
Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations
Alexey Tikhonov | Lisa Bylinina | Denis Paperno
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models. While different embeddings exhibit different applicability and performance on downstream tasks, little is known about the systematic representation differences attributed to the visual modality. Our paper compares word embeddings from three vision-and-language models (CLIP, OpenCLIP and Multilingual CLIP, Radford et al. 2021; Ilharco et al. 2021; Carlsson et al. 2022) and three text-only models, with static (FastText, Bojanowski et al. 2017) as well as contextual representations (multilingual BERT Devlin et al. 2018; XLM-RoBERTa, Conneau et al. 2019). This is the first large-scale study of the effect of visual grounding on language representations, including 46 semantic parameters. We identify meaning properties and relations that characterize words whose embeddings are most affected by the inclusion of visual modality in the training data; that is, points where visual grounding turns out most important. We find that the effect of visual modality correlates most with denotational semantic properties related to concreteness, but is also detected for several specific semantic classes, as well as for valence, a sentiment-related connotational property of linguistic expressions.

pdf bib abs
„Mann“ is to “Donna” as「国王」is to « Reine » Adapting the Analogy Task for Multilingual and Contextual Embeddings
Timothee Mickus | Eduardo Calò | Léo Jacqmin | Denis Paperno | Mathieu Constant
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

How does the word analogy task fit in the modern NLP landscape? Given the rarity of comparable multilingual benchmarks and the lack of a consensual evaluation protocol for contextual models, this remains an open question. In this paper, we introduce MATS: a multilingual analogy dataset, covering forty analogical relations in six languages, and evaluate human as well as static and contextual embedding performances on the task. We find that not all analogical relations are equally straightforward for humans, static models remain competitive with contextual embeddings, and optimal settings vary across languages and analogical relations. Several key challenges remain, including creating benchmarks that align with human reasoning and understanding what drives differences across methodologies.

2022

pdf bib abs
On Learning Interpreted Languages with Recurrent Models
Denis Paperno
Computational Linguistics, Volume 48, Issue 2 - June 2022

Can recurrent neural nets, inspired by human sequential data processing, learn to understand language? We construct simplified data sets reflecting core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. We find LSTM and GRU networks to generalize to compositional interpretation well, but only in the most favorable learning settings, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.

pdf bib abs
Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word Embeddings
Timothee Mickus | Kees Van Deemter | Mathieu Constant | Denis Paperno
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Word embeddings have advanced the state of the art in NLP across numerous tasks. Understanding the contents of dense neural representations is of utmost interest to the computational semantics community. We propose to focus on relating these opaque word vectors with human-readable definitions, as found in dictionaries This problem naturally divides into two subtasks: converting definitions into embeddings, and converting embeddings into definitions. This task was conducted in a multilingual setting, using comparable sets of embeddings trained homogeneously.

pdf bib abs
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus | Denis Paperno | Mathieu Constant
Transactions of the Association for Computational Linguistics, Volume 10

Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.

2020

pdf bib abs
Chinese Long and Short Form Choice Exploiting Neural Network Language Modeling Approaches
Lin Li | Kees van Deemter | Denis Paperno
Proceedings of the 19th Chinese National Conference on Computational Linguistics

This paper presents our work in long and short form choice, a significant question of lexical choice, which plays an important role in many Natural Language Understanding tasks. Long and short form sharing at least one identical word meaning but with different number of syllables is a highly frequent linguistic phenomenon in Chinese like 老虎-虎(laohu-hu, tiger)

pdf bib abs
Geo-Aware Image Caption Generation
Sofia Nikiforova | Tejaswini Deoskar | Denis Paperno | Yoad Winter
Proceedings of the 28th International Conference on Computational Linguistics

Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system’s ability to produce captions with relevant and factually accurate geographic referencing.

pdf bib abs
What Meaning-Form Correlation Has to Compose With: A Study of MFC on Artificial and Natural Language
Timothee Mickus | Timothée Bernard | Denis Paperno
Proceedings of the 28th International Conference on Computational Linguistics

Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze meaning-form correlation on three sets of languages: (i) artificial toy languages tailored to be compositional, (ii) a set of English dictionary definitions, and (iii) a set of English sentences drawn from literature. We find that linguistic phenomena such as synonymy and ungrounded stop-words weigh on MFC measurements, and that straightforward methods to mitigate their effects have widely varying results depending on the dataset they are applied to. Data and code are made publicly available.

pdf bib abs
Génération automatique de définitions pour le français (Definition Modeling in French)
Timothee Mickus | Mathieu Constant | Denis Paperno
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

La génération de définitions est une tâche récente qui vise à produire des définitions lexicographiques à partir de plongements lexicaux. Nous remarquons deux lacunes : (i) l’état de l’art actuel ne s’est penché que sur l’anglais et le chinois, et (ii) l’utilisation escomptée en tant que méthode d’évaluation des plongements lexicaux doit encore être vérifiée. Pour y remédier, nous proposons un jeu de données pour la génération de définitions en français, ainsi qu’une évaluation des performances d’un modèle de génération de définitions simple selon les plongements lexicaux fournis en entrée.

pdf bib
What do you mean, BERT?
Timothee Mickus | Denis Paperno | Mathieu Constant | Kees van Deemter
Proceedings of the Society for Computation in Linguistics 2020

pdf bib
Limitations in learning an interpreted language with recurrent models
Denis Paperno
Proceedings of the Society for Computation in Linguistics 2020

2019

pdf bib
Distributional Effects of Gender Contrasts Across Categories
Timothee Mickus | Olivier Bonami | Denis Paperno
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

pdf bib abs
Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling
Timothee Mickus | Denis Paperno | Matthieu Constant
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

Defining words in a textual context is a useful task both for practical purposes and for gaining insight into distributed word representations. Building on the distributional hypothesis, we argue here that the most natural formalization of definition modeling is to treat it as a sequence-to-sequence task, rather than a word-to-sequence task: given an input sequence with a highlighted word, generate a contextually appropriate definition for it. We implement this approach in a Transformer-based sequence-to-sequence model. Our proposal allows to train contextualization and definition generation in an end-to-end fashion, which is a conceptual improvement over earlier works. We achieve state-of-the-art results both in contextual and non-contextual definition modeling.

pdf bib abs
Choosing between Long and Short Word Forms in Mandarin
Lin Li | Kees van Deemter | Denis Paperno | Jingyu Fan
Proceedings of the 12th International Conference on Natural Language Generation

Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers’ long and short form choice.

2018

pdf bib
Generalizing Representations of Lexical Semantic Relations
Anupama Chingacham | Denis Paperno
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib abs
SemEval-2018 Task 10: Capturing Discriminative Attributes
Alicia Krebs | Alessandro Lenci | Denis Paperno
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Participants were asked to identify whether an attribute could help discriminate between two concepts. For example, a successful system should determine that ‘urine’ is a discriminating feature in the word pair ‘kidney’, ‘bone’. The aim of the task is to better evaluate the capabilities of state of the art semantic models, beyond pure semantic similarity. The task attracted submissions from 21 teams, and the best system achieved a 0.75 F1 score.

pdf bib abs
Limitations in learning an interpreted language with recurrent models
Denis Paperno
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

In this submission I report work in progress on learning simplified interpreted languages by means of recurrent models. The data is constructed to reflect core properties of natural language as modeled in formal syntax and semantics. Preliminary results suggest that LSTM networks do generalise to compositional interpretation, albeit only in the most favorable learning setting.

2016

pdf bib
Squibs: When the Whole Is Less Than the Sum of Its Parts: How Composition Affects PMI Values in Distributional Semantic Vectors
Denis Paperno | Marco Baroni
Computational Linguistics, Volume 42, Issue 2 - June 2016

pdf bib
There Is No Logical Negation Here, But There Are Alternatives: Modeling Conversational Negation with Distributional Semantics
Germán Kruszewski | Denis Paperno | Raffaella Bernardi | Marco Baroni
Computational Linguistics, Volume 42, Issue 4 - December 2016

pdf bib abs
Typology of Adjectives Benchmark for Compositional Distributional Models
Daria Ryzhova | Maria Kyuseva | Denis Paperno
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present a novel application of compositional distributional semantic models (CDSMs): prediction of lexical typology. The paper introduces the notion of typological closeness, which is a novel rigorous formalization of semantic similarity based on comparison of multilingual data. Starting from the Moscow Database of Qualitative Features for adjective typology, we create four datasets of typological closeness, on which we test a range of distributional semantic models. We show that, on the one hand, vector representations of phrases based on data from one language can be used to predict how words within the phrase translate into different languages, and, on the other hand, that typological data can serve as a semantic benchmark for distributional models. We find that compositional distributional models, especially parametric ones, perform way above non-compositional alternatives on the task.

pdf bib
When Hyperparameters Help: Beneficial Parameter Combinations in Distributional Semantic Models
Alicia Krebs | Denis Paperno
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Capturing Discriminative Attributes in a Distributional Space: Task Proposal
Alicia Krebs | Denis Paperno
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

2015

pdf bib abs
Deriving Boolean structures from distributional vectors
German Kruszewski | Denis Paperno | Marco Baroni
Transactions of the Association for Computational Linguistics, Volume 3

Corpus-based distributional semantic models capture degrees of semantic relatedness among the words of very large vocabularies, but have problems with logical phenomena such as entailment, that are instead elegantly handled by model-theoretic approaches, which, in turn, do not scale up. We combine the advantages of the two views by inducing a mapping from distributional vectors of words (or sentences) into a Boolean structure of the kind in which natural language terms are assumed to denote. We evaluate this Boolean Distributional Semantic Model (BDSM) on recognizing entailment between words and sentences. The method achieves results comparable to a state-of-the-art SVM, degrades more gracefully when less training data are available and displays interesting qualitative properties.

pdf bib
Leveraging Preposition Ambiguity to Assess Compositional Distributional Models of Semantics
Samuel Ritter | Cotie Long | Denis Paperno | Marco Baroni | Matthew Botvinick | Adele Goldberg
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
Distributional Semantics in Use
Raffaella Bernardi | Gemma Boleda | Raquel Fernández | Denis Paperno
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics