Stefano Menini


2021

pdf bib
Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators’ Disagreement
Elisa Leonardelli | Stefano Menini | Alessio Palmero Aprosio | Marco Guerini | Sara Tonelli
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle the problem from an algorithmic perspective, so to reduce the need for annotated data, less attention has been paid to the quality of these data. Following a trend that has emerged recently, we focus on the level of agreement among annotators while selecting data to create offensive language datasets, a task involving a high level of subjectivity. Our study comprises the creation of three novel datasets of English tweets covering different topics and having five crowd-sourced judgments each. We also present an extensive set of experiments showing that selecting training and test data according to different levels of annotators’ agreement has a strong effect on classifiers performance and robustness. Our findings are further validated in cross-domain experiments and studied using a popular benchmark dataset. We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation and we advocate for a higher presence of ambiguous cases in future datasets, in order to train more robust systems and better account for the different points of view expressed online.

pdf bib
FrameNet-like Annotation of Olfactory Information in Texts
Sara Tonelli | Stefano Menini
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Although olfactory references play a crucial role in our cultural memory, only few works in NLP have tried to capture them from a computational perspective. Currently, the main challenge is not much the development of technological components for olfactory information extraction, given recent advances in semantic processing and natural language understanding, but rather the lack of a theoretical framework to capture this information from a linguistic point of view, as a preliminary step towards the development of automated systems. Therefore, in this work we present the annotation guidelines, developed with the help of history scholars and domain experts, aimed at capturing all the relevant elements involved in olfactory situations or events described in texts. These guidelines have been inspired by FrameNet annotation, but underwent some adaptations, which are detailed in this paper. Furthermore, we present a case study concerning the annotation of olfactory situations in English historical travel writings describing trips to Italy. An analysis of the most frequent role fillers show that olfactory descriptions pertain to some typical domains such as religion, food, nature, ancient past, poor sanitation, all supporting the creation of a stereotypical imagery related to Italy. On the other hand, positive feelings triggered by smells are prevalent, and contribute to framing travels to Italy as an exciting experience involving all senses.

2020

pdf bib
FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language Detection
Camilla Casula | Alessio Palmero Aprosio | Stefano Menini | Sara Tonelli
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval2). For Danish, Turkish, Arabic and Greek, we develop an architecture based on transfer learning and relying on a two-channel BERT model, in which the English BERT and the multilingual one are combined after creating a machine-translated parallel corpus for each language in the task. For English, instead, we adopt a more standard, single-channel approach. We find that, in a multilingual scenario, with some languages having small training data, using parallel BERT models with machine translated data can give systems more stability, especially when dealing with noisy data. The fact that machine translation on social media data may not be perfect does not hurt the overall classification performance.

pdf bib
Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.

2019

pdf bib
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Third Workshop on Abusive Language Online

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

2018

pdf bib
Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying
Rachele Sprugnoli | Stefano Menini | Sara Tonelli | Filippo Oncini | Enrico Piras
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Although WhatsApp is used by teenagers as one major channel of cyberbullying, such interactions remain invisible due to the app privacy policies that do not allow ex-post data collection. Indeed, most of the information on these phenomena rely on surveys regarding self-reported data. In order to overcome this limitation, we describe in this paper the activities that led to the creation of a WhatsApp dataset to study cyberbullying among Italian students aged 12-13. We present not only the collected chats with annotations about user role and type of offense, but also the living lab created in a collaboration between researchers and schools to monitor and analyse cyberbullying. Finally, we discuss some open issues, dealing with ethical, operational and epistemic aspects.

2017

pdf bib
Topic-Based Agreement and Disagreement in US Electoral Manifestos
Stefano Menini | Federico Nanni | Simone Paolo Ponzetto | Sara Tonelli
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a topic-based analysis of agreement and disagreement in political manifestos, which relies on a new method for topic detection based on key concept clustering. Our approach outperforms both standard techniques like LDA and a state-of-the-art graph-based method, and provides promising initial results for this new task in computational social science.

pdf bib
RAMBLE ON: Tracing Movements of Popular Historical Figures
Stefano Menini | Rachele Sprugnoli | Giovanni Moretti | Enrico Bignotti | Sara Tonelli | Bruno Lepri
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of the XX Century.

2016

pdf bib
Agreement and Disagreement: Comparison of Points of View in the Political Domain
Stefano Menini | Sara Tonelli
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The automated comparison of points of view between two politicians is a very challenging task, due not only to the lack of annotated resources, but also to the different dimensions participating to the definition of agreement and disagreement. In order to shed light on this complex task, we first carry out a pilot study to manually annotate the components involved in detecting agreement and disagreement. Then, based on these findings, we implement different features to capture them automatically via supervised classification. We do not focus on debates in dialogical form, but we rather consider sets of documents, in which politicians may express their position with respect to different topics in an implicit or explicit way, like during an electoral campaign. We create and make available three different datasets.

pdf bib
“Who was Pietro Badoglio?” Towards a QA system for Italian History
Stefano Menini | Rachele Sprugnoli | Antonio Uva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents QUANDHO (QUestion ANswering Data for italian HistOry), an Italian question answering dataset created to cover a specific domain, i.e. the history of Italy in the first half of the XX century. The dataset includes questions manually classified and annotated with Lexical Answer Types, and a set of question-answer pairs. This resource, freely available for research purposes, has been used to retrain a domain independent question answering system so to improve its performances in the domain of interest. Ongoing experiments on the development of a question classifier and an automatic tagger of Lexical Answer Types are also presented.

2014

pdf bib
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment
Marco Marelli | Luisa Bentivogli | Marco Baroni | Raffaella Bernardi | Stefano Menini | Roberto Zamparelli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
A SICK cure for the evaluation of compositional distributional semantic models
Marco Marelli | Stefano Menini | Marco Baroni | Luisa Bentivogli | Raffaella Bernardi | Roberto Zamparelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Shared and internationally recognized benchmarks are fundamental for the development of any computational system. We aim to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them. SICK consists of about 10,000 English sentence pairs that include many examples of the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (idiomatic multiword expressions, named entities, telegraphic language) that are not within the scope of CDSMs. By means of crowdsourcing techniques, each pair was annotated for two crucial semantic tasks: relatedness in meaning (with a 5-point rating scale as gold score) and entailment relation between the two elements (with three possible gold labels: entailment, contradiction, and neutral). The SICK data set was used in SemEval-2014 Task 1, and it freely available for research purposes.