2024
pdf
bib
abs
MultiPICo: Multilingual Perspectivist Irony Corpus
Silvia Casola
|
Simona Frenda
|
Soda Marem Lo
|
Erhan Sezerer
|
Antonio Uva
|
Valerio Basile
|
Cristina Bosco
|
Alessandro Pedrani
|
Chiara Rubagotti
|
Viviana Patti
|
Davide Bernardi
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently, several scholars have contributed to the growth of a new theoretical framework in NLP called perspectivism. This approach aimsto leverage data annotated by different individuals to model diverse perspectives that affect their opinions on subjective phenomena such as irony. In this context, we propose MultiPICo, a multilingual perspectivist corpus of ironic short conversations in different languages andlinguistic varieties extracted from Twitter and Reddit. The corpus includes sociodemographic information about its annotators. Our analysis of the annotated corpus shows how different demographic cohorts may significantly disagree on their annotation of irony and how certain cultural factors influence the perception of the phenomenon and the agreement on the annotation. Moreover, we show how disaggregated annotations and rich annotator metadata can be exploited to benchmark the ability of large language models to recognize irony, their positionality with respect to sociodemographic groups, and the efficacy of perspective-taking prompting for irony detection in multiple languages.
pdf
bib
abs
Data Augmentation through Back-Translation for Stereotypes and Irony Detection
Tom Bourgeade
|
Silvia Casola
|
Adel Mahmoud Wizan
|
Cristina Bosco
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, andArabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier forstereotype or irony detection on mono-lingual data.
pdf
bib
abs
PERSEID - Perspectivist Irony Detection: A CALAMITA Challenge
Valerio Basile
|
Silvia Casola
|
Simona Frenda
|
Soda Marem Lo
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Works in perspectivism and human label variation have emphasized the need to collect and leverage various voices and points of view in the whole Natural Language Processing pipeline.PERSEID places itself in this line of work. We consider the task of irony detection from short social media conversations in Italian collected from Twitter (X) and Reddit. To do so, we leverage data from MultiPICO, a recent multilingual dataset with disaggregated annotations and annotators’ metadata, containing 1000 Post, Reply pairs with five annotations each on average.We aim to evaluate whether prompting LLMs with additional annotators’ demographic information (namely gender only, age only, and the combination of the two) results in improved performance compared to a baseline in which only the input text is provided.The evaluation is zero-shot; and we evaluate the results on the disaggregated annotations using f1.
pdf
bib
abs
GFG - Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
|
Andrea Piergentili
|
Beatrice Savoldi
|
Marco Madeddu
|
Martina Rosola
|
Silvia Casola
|
Chiara Ferrando
|
Viviana Patti
|
Matteo Negri
|
Luisa Bentivogli
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication. The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fairformulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a coverage-weighted accuracy for tasks 2 and 3.
pdf
bib
abs
I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models
Pier Felice Balestrucci
|
Silvia Casola
|
Soda Marem Lo
|
Valerio Basile
|
Alessandro Mazzei
Findings of the Association for Computational Linguistics: EMNLP 2024
Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results.
2023
pdf
bib
abs
Confidence-based Ensembling of Perspective-aware Models
Silvia Casola
|
Soda Marem Lo
|
Valerio Basile
|
Simona Frenda
|
Alessandra Teresa Cignarella
|
Viviana Patti
|
Cristina Bosco
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Research in the field of NLP has recently focused on the variability that people show in selecting labels when performing an annotation task. Exploiting disagreements in annotations has been shown to offer advantages for accurate modelling and fair evaluation. In this paper, we propose a strongly perspectivist model for supervised classification of natural language utterances. Our approach combines the predictions of several perspective-aware models using key information of their individual confidence to capture the subjectivity encoded in the annotation of linguistic phenomena. We validate our method through experiments on two case studies, irony and hate speech detection, in in-domain and cross-domain settings. The results show that confidence-based ensembling of perspective-aware models seems beneficial for classification performance in all scenarios. In addition, we demonstrate the effectiveness of our method with automatically extracted perspectives from annotations when the annotators’ metadata are not available.
2022
pdf
bib
abs
What’s in a (dataset’s) name? The case of BigPatent
Silvia Casola
|
Alberto Lavelli
|
Horacio Saggion
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Sharing datasets and benchmarks has been crucial for rapidly improving Natural Language Processing models and systems. Documenting datasets’ characteristics (and any modification introduced over time) is equally important to avoid confusion and make comparisons reliable. Here, we describe the case of BigPatent, a dataset for patent summarization that exists in at least two rather different versions under the same name. While previous literature has not clearly distinguished among versions, their differences do not only lay on a surface level but also modify the dataset’s core nature and, thus, the complexity of the summarization task. While this paper describes a specific case, we aim to shed light on new challenges that might emerge in resource sharing and advocate for comprehensive documentation of datasets and models.
pdf
bib
abs
Exploring the limits of a base BART for multi-document summarization in the medical domain
Ishmael Obonyo
|
Silvia Casola
|
Horacio Saggion
Proceedings of the Third Workshop on Scholarly Document Processing
This paper is a description of our participation in the Multi-document Summarization for Literature Review (MSLR) Shared Task, in which we explore summarization models to create an automatic review of scientific results. Rather than maximizing the metrics using expensive computational models, we placed ourselves in a situation of scarce computational resources and explore the limits of a base sequence to sequence models (thus with a limited input length) to the task. Although we explore methods to feed the abstractive model with salient sentences only (using a first extractive step), we find the results still need some improvements.
2020
pdf
bib
abs
FBK@SMM4H2020: RoBERTa for Detecting Medications on Twitter
Silvia Casola
|
Alberto Lavelli
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
This paper describes a classifier for tweets that mention medications or supplements, based on a pretrained transformer. We developed such a system for our participation in Subtask 1 of the Social Media Mining for Health Application workshop, which featured an extremely unbalanced dataset. The model showed promising results, with an F1 of 0.8 (task mean: 0.66).