Arianna Muti


pdf bib
MNLP@Multilingual Counterspeech Generation: Evaluating Translation and Background Knowledge Filtering
Emanuele Moscato | Arianna Muti | Debora Nozza
Proceedings of the First Workshop on Multilingual Counterspeech Generation

We describe our participation in the Multilingual Counterspeech Generation shared task, which aims to generate a counternarrative to counteract hate speech, given a hateful sentence and relevant background knowledge. Our team tested two different aspects: translating outputs from English vs generating outputs in the original languages and filtering pieces of the background knowledge provided vs including all the background knowledge. Our experiments show that filtering the background knowledge in the same prompt and leaving data in the original languages leads to more adherent counternarrative generations, except for Basque, where translating the output from English and filtering the background knowledge in a separate prompt yields better results. Our system ranked first in English, Italian, and Spanish and fourth in Basque.


pdf bib
GroningenAnnotatesGaza at the FIGNEWS 2024 Shared Task: Analyzing Bias in Conflict Narratives
Khalid Khatib | Sara Gemelli | Saskia Heisterborg | Pritha Majumdar | Gosse Minnema | Arianna Muti | Noa Solissa
Proceedings of The Second Arabic Natural Language Processing Conference

In this paper we report the development of our annotation methodology for the shared task FIGNEWS 2024. The objective of the shared task is to look into the layers of bias in how the war on Gaza is represented in media narrative. Our methodology follows the prescriptive paradigm, in which guidelines are detailed and refined through an iterative process in which edge cases are discussed and converged. Our IAA score (Krippendorff’s 𝛼) is 0.420, highlighting the challenging and subjective nature of the task. Our results show that 52% of posts were unbiased, 42% biased against Palestine, 5% biased against Israel, and 3% biased against both. 16% were unclear or not applicable.

pdf bib
Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts
Arianna Muti | Federico Ruggeri | Khalid Al Khatib | Alberto Barrón-Cedeño | Tommaso Caselli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.

pdf bib
A Corpus for Sentence-Level Subjectivity Detection on English News Articles
Francesco Antici | Federico Ruggeri | Andrea Galassi | Katerina Korre | Arianna Muti | Alessandra Bardi | Alice Fedotova | Alberto Barrón-Cedeño
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.

pdf bib
PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets
Arianna Muti | Federico Ruggeri | Cagri Toraman | Alberto Barrón-Cedeño | Samuel Algherini | Lorenzo Musetti | Silvia Ronchi | Gianmarco Saretto | Caterina Zapparoli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny at the sentence level. We evaluate the impact of injecting information about disambiguated words into a model targeting misogyny detection. In particular, we explore two different approaches for injection: concatenation of pejorative information and substitution of ambiguous words with univocal terms. Our experimental results, both on our corpus and on two popular benchmarks on Italian tweets, show that both approaches lead to a major classification improvement, indicating that word sense disambiguation is a promising preliminary step for misogyny detection. Furthermore, we investigate LLMs’ understanding of pejorative epithets by means of contextual word embeddings analysis and prompting.

pdf bib
The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration
Katerina Korre | Arianna Muti | Alberto Barrón-Cedeño
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Hate speech is infamously one of the most demanding topics in Natural Language Processing, as its multifacetedness is accompanied by a handful of challenges, such as multilinguality and cross-linguality. Hate speech has a subjective aspect that intensifies when referring to different cultures and different languages. In this respect, we design a pipeline that will help us explore the possibility of the creation of a parallel multilingual hate speech dataset, using machine translation. In this paper, we evaluate how/whether this is feasible by assessing the quality of the translations, calculating the toxicity levels of original and target texts, and calculating correlations between the newly obtained scores. Finally, we perform a qualitative analysis to gain further semantic and grammatical insights. With this pipeline we aim at exploring ways of filtering hate speech texts in order to parallelize sentences in multiple languages, examining the challenges of the task.


pdf bib
On the Identification and Forecasting of Hate Speech in Inceldom
Paolo Gajo | Arianna Muti | Katerina Korre | Silvia Bernardini | Alberto Barrón-Cedeño
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Spotting hate speech in social media posts is crucial to increase the civility of the Web and has been thoroughly explored in the NLP community. For the first time, we introduce a multilingual corpus for the analysis and identification of hate speech in the domain of inceldom, built from incel Web forums in English and Italian, including expert annotation at the post level for two kinds of hate speech: misogyny and racism. This resource paves the way for the development of mono- and cross-lingual models for (a) the identification of hateful (misogynous and racist) posts and (b) the forecasting of the amount of hateful responses that a post is likely to trigger. Our experiments aim at improving the performance of Transformer-based models using masked language modeling pre-training and dataset merging. The results show that these strategies boost the models’ performance in all settings (binary classification, multi-label classification and forecasting), especially in the cross-lingual scenarios.

pdf bib
UniBoe’s at SemEval-2023 Task 10: Model-Agnostic Strategies for the Improvement of Hate-Tuned and Generative Models in the Classification of Sexist Posts
Arianna Muti | Francesco Fernicola | Alberto Barrón-Cedeño
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present our submission to SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS). We address all three tasks: Task A consists of identifying whether a post is sexist. If so, Task B attempts to assign it one of four categories: threats, derogation, animosity, and prejudiced discussions. Task C aims for an even more fine-grained classification, divided among 11 classes. Our team UniBoe’s experiments with fine-tuning of hate-tuned Transformer-based models and priming for generative models. In addition, we explore model-agnostic strategies, such as data augmentation techniques combined with active learning, as well as obfuscation of identity terms. Our official submissions obtain an F1_score of 0.83 for Task A, 0.58 for Task B and 0.32 for Task C.


pdf bib
A Checkpoint on Multilingual Misogyny Identification
Arianna Muti | Alberto Barrón-Cedeño
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

We address the problem of identifying misogyny in tweets in mono and multilingual settings in three languages: English, Italian, and Spanish. We explore model variations considering single and multiple languages both in the pre-training of the transformer and in the training of the downstream taskto explore the feasibility of detecting misogyny through a transfer learning approach across multiple languages. That is, we train monolingual transformers with monolingual data, and multilingual transformers with both monolingual and multilingual data. Our models reach state-of-the-art performance on all three languages. The single-language BERT models perform the best, closely followed by different configurations of multilingual BERT models. The performance drops in zero-shot classification across languages. Our error analysis shows that multilingual and monolingual models tend to make the same mistakes.

pdf bib
Misogyny and Aggressiveness Tend to Come Together and Together We Address Them
Arianna Muti | Francesco Fernicola | Alberto Barrón-Cedeño
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We target the complementary binary tasks of identifying whether a tweet is misogynous and, if that is the case, whether it is also aggressive. We compare two ways to address these problems: one multi-class model that discriminates between all the classes at once: not misogynous, non aggressive-misogynous and aggressive-misogynous; as well as a cascaded approach where the binary classification is carried out separately (misogynous vs non-misogynous and aggressive vs non-aggressive) and then joined together. For the latter, two training and three testing scenarios are considered. Our models are built on top of AlBERTo and are evaluated on the framework of Evalita’s 2020 shared task on automatic misogyny and aggressiveness identification in Italian tweets. Our cascaded models —including the strong naïve baseline— outperform significantly the top submissions to Evalita, reaching state-of-the-art performance without relying on any external information.

pdf bib
LeaningTower@LT-EDI-ACL2022: When Hope and Hate Collide
Arianna Muti | Marta Marchiori Manerba | Katerina Korre | Alberto Barrón-Cedeño
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

The 2022 edition of LT-EDI proposed two tasks in various languages. Task Hope Speech Detection required models for the automatic identification of hopeful comments for equality, diversity, and inclusion. Task Homophobia/Transphobia Detection focused on the identification of homophobic and transphobic comments. We targeted both tasks in English by using reinforced BERT-based approaches. Our core strategy aimed at exploiting the data available for each given task to augment the amount of supervised instances in the other. On the basis of an active learning process, we trained a model on the dataset for Task i and applied it to the dataset for Task j to iteratively integrate new silver data for Task i. Our official submissions to the shared task obtained a macro-averaged F1 score of 0.53 for Hope Speech and 0.46 for Homo/Transphobia, placing our team in the third and fourth positions out of 11 and 12 participating teams respectively.

pdf bib
UniBO at SemEval-2022 Task 5: A Multimodal bi-Transformer Approach to the Binary and Fine-grained Identification of Misogyny in Memes
Arianna Muti | Katerina Korre | Alberto Barrón-Cedeño
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We present our submission to SemEval 2022 Task 5 on Multimedia Automatic Misogyny Identification. We address the two tasks: Task A consists of identifying whether a meme is misogynous. If so, Task B attempts to identify its kind among shaming, stereotyping, objectification, and violence. Our approach combines a BERT Transformer with CLIP for the textual and visual representations. Both textual and visual encoders are fused in an early-fusion fashion through a Multimodal Bidirectional Transformer with unimodally pretrained components. Our official submissions obtain macro-averaged F1=0.727 in Task A (4th position out of 69 participants)and weighted F1=0.710 in Task B (4th position out of 42 participants).