Debora Nozza

2025

Probing Feminist Representations: A Study of Bias in LLMs and Word Embeddings
Arianna Muti | Elisa Bassignana | Emanuele Moscato | Debora Nozza
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib abs

Biased Tales: Cultural and Topic Bias in Generating Children’s Stories
Donya Rooein | Vilém Zouhar | Debora Nozza | Dirk Hovy
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Stories play a pivotal role in human communication, shaping beliefs and morals, particularly in children. As parents increasingly rely on large language models (LLMs) to craft bedtime stories, the presence of cultural and gender stereotypes in these narratives raises significant concerns. To address this issue, we present Biased Tales, a comprehensive dataset designed to analyze how biases influence protagonists’ attributes and story elements in LLM-generated stories. Our analysis uncovers striking disparities. When the protagonist is described as a girl (as compared to a boy), appearance-related attributes increase by 55.26%. Stories featuring non-Western children disproportionately emphasize cultural heritage, tradition, and family themes far more than those for Western children. Our findings highlight the role of sociocultural bias in making creative AI use more equitable and diverse.

pdf bib abs

Personalization up to a Point: Why Personalized Content Moderation Needs Boundaries, and How We Can Enforce Them
Emanuele Moscato | Tiancheng Hu | Matthias Orlikowski | Paul Röttger | Debora Nozza
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Personalized content moderation can protect users from harm while facilitating free expression by tailoring moderation decisions to individual preferences rather than enforcing universal rules. However, content moderation that is fully personalized to individual preferences, no matter what these preferences are, may lead to even the most hazardous types of content being propagated on social media. In this paper, we explore this risk using hate speech as a case study. Certain types of hate speech are illegal in many countries. We show that, while fully personalized hate speech detection models increase overall user welfare (as measured by user-level classification performance), they also make predictions that violate such legal hate speech boundaries, especially when tailored to users who tolerate highly hateful content. To address this problem, we enforce legal boundaries in personalized hate speech detection by overriding predictions from personalized models with those from a boundary classifier. This approach significantly reduces legal violations while minimally affecting overall user welfare. Our findings highlight both the promise and the risks of personalized moderation, and offer a practical solution to balance user preferences with legal and ethical obligations.

pdf bib abs

Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification
Hamidreza Saffari | Mohammadamin Shafiei | Donya Rooein | Francesco Pierri | Debora Nozza
Findings of the Association for Computational Linguistics: NAACL 2025

Creating globally inclusive AI systems demands datasets reflecting diverse social norms. Iran, with its unique cultural blend, offers an ideal case study, with Farsi adding linguistic complexity. In this work, we introduce the Iranian Social Norms (ISN) dataset, a novel collection of 1,699 Iranian social norms, including environments, demographic features, and scope annotation, alongside English translations. Our evaluation of 6 Large Language Models (LLMs) in classifying Iranian social norms, using a variety of prompts, uncovered critical insights into the impact of geographic and linguistic context. Results revealed a substantial performance gap in LLMs’ comprehension of Iranian norms. Notably, while the geographic context in English prompts enhanced the performance, this effect was absent in Farsi, pointing to nuanced linguistic challenges. Particularly, performance was significantly worse for Iran-specific norms, emphasizing the importance of culturally tailored datasets. As the first Farsi dataset for social norm classification, ISN will facilitate crucial cross-cultural analyses, shedding light on how values differ across contexts and cultures.

pdf bib abs

The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny
Arianna Muti | Chris Emmery | Debora Nozza | Alberto Barrón-Cedeño | Tommaso Caselli
Findings of the Association for Computational Linguistics: EMNLP 2025

Persistent societal biases like misogyny express themselves more often implicitly than through openly hostile language.However, previous misogyny studies have focused primarily on explicit language, overlooking these more subtle forms. We bridge this gap by examining implicit misogynistic expressions in English and Italian. First, we develop a taxonomy of social dynamics, i.e., the underlying communicative intent behind misogynistic statements in social media data. Then, we test the ability of nine LLMs to identify the social dynamics as a multi-label classification and text span selection: first LLMs must choose social dynamics given a prefixed list, then they have to explicitly identify the text spans that triggered their decisions. We also investigate the extent of using different learning settings: zero and few-shot, and prescriptive. Our analysis suggests that LLMs struggle to follow instructions and reason in all settings, mostly relying on semantic associations, recasting claims of emergent abilities.

pdf bib

Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Agnieszka Faleńska | Christine Basta | Marta Costa-jussà | Karolina Stańczak | Debora Nozza
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

pdf bib abs

Measuring Gender Bias in Language Models in Farsi
Hamidreza Saffari | Mohammadamin Shafiei | Donya Rooein | Debora Nozza
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

As Natural Language Processing models become increasingly embedded in everyday life, ensuring that these systems can measure and mitigate bias is critical. While substantial work has been done to identify and mitigate gender bias in English, Farsi remains largely underexplored. This paper presents the first comprehensive study of gender bias in language models in Farsi across three tasks: emotion analysis, question answering, and hurtful sentence completion. We assess a range of language models across all the tasks in zero-shot settings. By adapting established evaluation frameworks for Farsi, we uncover patterns of gender bias that differ from those observed in English, highlighting the urgent need for culturally and linguistically inclusive approaches to bias mitigation in NLP.

pdf bib abs

MilaNLP@Multilingual Counterspeech Generation: Evaluating Translation and Background Knowledge Filtering
Emanuele Moscato | Arianna Muti | Debora Nozza
Proceedings of the First Workshop on Multilingual Counterspeech Generation

We describe our participation in the Multilingual Counterspeech Generation shared task, which aims to generate a counternarrative to counteract hate speech, given a hateful sentence and relevant background knowledge. Our team tested two different aspects: translating outputs from English vs generating outputs in the original languages and filtering pieces of the background knowledge provided vs including all the background knowledge. Our experiments show that filtering the background knowledge in the same prompt and leaving data in the original languages leads to more adherent counternarrative generations, except for Basque, where translating the output from English and filtering the background knowledge in a separate prompt yields better results. Our system ranked first in English, Italian, and Spanish and fourth in Basque.

pdf bib

pdf bib abs

HODIAT: A Dataset for Detecting Homotransphobic Hate Speech in Italian with Aggressiveness and Target Annotation
Greta Damo | Alessandra Teresa Cignarella | Tommaso Caselli | Viviana Patti | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

The escalating spread of homophobic and transphobic rhetoric in both online and offline spaces has become a growing global concern, with Italy standing out as one of the countries where acts of violence against LGBTQIA+ individuals persist and increase year after year. This short paper study analyzes hateful language against LGBTQIA+ individuals in Italian using novel annotation labels for aggressiveness and target. We assess a range of multilingual and Italian language models on this newannotation layers across zero-shot, few-shot, and fine-tuning settings. The results reveal significant performance gaps across models and settings, highlighting the limitations of zero- and few-shot approaches and the importance of fine-tuning on labelled data, when available, to achieve high prediction performance.

pdf bib abs

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification
Viola De Ruvo | Arianna Muti | Daryna Dementieva | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Toxic language online poses growing challenges for content moderation. Detoxification, which rewrites toxic content into neutral form, offers a promising alternative but remains underexplored beyond English. We present Detoxify-IT, the first Italian dataset for this task, featuring toxic comments and their human-written neutral rewrites. Our experiments show that even limited fine-tuning on Italian data leads to notable improvements in content preservation and fluency compared to both multilingual models and LLMs used in zero-shot settings, underlining the need for language-specific resources. This work enables detoxification research in Italian and supports broader efforts toward safer, more inclusive online communication.

pdf bib abs

Blue-haired, misandriche, rabiata: Tracing the Connotation of ‘Feminist(s)’ Across Time, Languages and Domains
Arianna Muti | Sara Gemelli | Emanuele Moscato | Emilie Francis | Amanda Cercas Curry | Flor Miriam Plaza-del-Arco | Debora Nozza
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Understanding how words shift in meaning is crucial for analyzing societal attitudes.In this study, we investigate the contextual variations of the terms feminist, feminists along three axes: time, language, and domain.To this aim, we collect and release FEMME, a dataset comprising the occurrences of such terms from 2014 to 2023 in English, Italian and Swedish in Twitter, Reddit and Incel domains.Our methodology leverages frame analysis, as well as fine-tuning and LLMs. We find that the connotation of the plural form feminists is consistently more negative than feminist, indicating more hostility towards feminists as a collective, which often triggers greater societal pushback, reflecting broader patterns of group-based hostility and stigma. Across languages, we observe similar stereotypes towards feminists that often include body shaming, as well as accusations of hypocrisy and irrational behavior. In terms of time, we identify events that trigger a peak in terms of negative or positive connotation.As expected, the Incel spheres show predominantly negative connotations, while the general domains show mixed connotations.

2024

pdf bib abs

MONICA: Monitoring Coverage and Attitudes of Italian Measures in Response to COVID-19
Fabio Pernisi | Giuseppe Attanasio | Debora Nozza
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Modern social media have long been observed as a mirror for public discourse and opinions. Especially in the face of exceptional events, computational language tools are valuable for understanding public sentiment and reacting quickly. During the coronavirus pandemic, the Italian government issued a series of financial measures, each unique in target, requirements, and benefits. Despite the widespread dissemination of these measures, it is currently unclear how they were perceived and whether they ultimately achieved their goal.In this paper, we document the collection and release of MONICA, a new social media dataset for MONItoring Coverage and Attitudes to such measures. Data include approximately ten thousand posts discussing a variety of measures in ten months. We collected annotations for sentiment, emotion, irony, and topics for each post. We conducted an extensive analysis using computational models to learn these aspects from text. We release a compliant version of the dataset to foster future research on computational approaches for understanding public opinion about government measures. We will release the data at URL.

pdf bib abs

Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP
Pieter Delobelle | Giuseppe Attanasio | Debora Nozza | Su Lin Blodgett | Zeerak Talat
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

This paper introduces the concept of actionability in the context of bias measures in natural language processing (NLP). We define actionability as the degree to which a measure’s results enable informed action and propose a set of desiderata for assessing it. Building on existing frameworks such as measurement modeling, we argue that actionability is a crucial aspect of bias measures that has been largely overlooked in the literature.We conduct a comprehensive review of 146 papers proposing bias measures in NLP, examining whether and how they provide the information required for actionable results. Our findings reveal that many key elements of actionability, including a measure’s intended use and reliability assessment, are often unclear or entirely absent.This study highlights a significant gap in the current approach to developing and reporting bias measures in NLP. We argue that this lack of clarity may impede the effective implementation and utilization of these measures. To address this issue, we offer recommendations for more comprehensive and actionable metric development and reporting practices in NLP bias research.

pdf bib abs

Countering Hateful and Offensive Speech Online - Open Challenges
Flor Miriam Plaza-del-Arco | Debora Nozza | Marco Guerini | Jeffrey Sorensen | Marcos Zampieri
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In today’s digital age, hate speech and offensive speech online pose a significant challenge to maintaining respectful and inclusive online environments. This tutorial aims to provide attendees with a comprehensive understanding of the field by delving into essential dimensions such as multilingualism, counter-narrative generation, a hands-on session with one of the most popular APIs for detecting hate speech, fairness, and ethics in AI, and the use of recent advanced approaches. In addition, the tutorial aims to foster collaboration and inspire participants to create safer online spaces by detecting and mitigating hate speech.

pdf bib

Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Agnieszka Faleńska | Christine Basta | Marta Costa-jussà | Seraphina Goldfarb-Tarrant | Debora Nozza
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

pdf bib abs

We describe the details of the Shared Task of the 5th ACL Workshop on Gender Bias in Natural Language Processing (GeBNLP 2024). The task uses dataset to investigate the quality of Machine Translation systems on a particular case of gender robustness. We report baseline results as well as the results of the first participants. The shared task will be permanently available in the Dynabench platform.

pdf bib abs

Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
Flor Miriam Plaza-del-Arco | Debora Nozza | Dirk Hovy
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and FSL) scenarios. However, since they are trained on different datasets, performance varies widely across tasks between those models. Recent studies emphasize the importance of considering human label variation in data annotation. However, how this human label variation also applies to LLMs remains unexplored. Given this likely model specialization, we ask: Do aggregate LLM labels improve over individual models (as for human annotators)? We evaluate four recent instruction-tuned LLMs as “annotators” on five subjective tasks across four languages. We use ZSL and FSL setups and label aggregation from human annotation. Aggregations are indeed substantially better than any individual model, benefiting from specialization in diverse tasks or languages. Surprisingly, FSL does not surpass ZSL, as it depends on the quality of the selected examples. However, there seems to be no good information-theoretical strategy to select those. We find that no LLM method rivals even simple supervised models. We also discuss the tradeoffs in accuracy, cost, and moral/ethical considerations between LLM and human annotation.

pdf bib abs

FairBelief - Assessing Harmful Beliefs in Language Models
Mattia Setzu | Marta Marchiori Manerba | Pasquale Minervini | Debora Nozza
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)

Language Models (LMs) have been shown to inherit undesired biases that might hurt minorities and underrepresented groups if such systems were integrated into real-world applications without careful fairness auditing.This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that an LM may embed with different degrees of confidence and that covertly influence its predictions. With FairBelief, we leverage prompting to study the behavior of several state-of-the-art LMs across different previously neglected axes, such as model scale and likelihood, assessing predictions on a fairness dataset specifically designed to quantify LMs’ outputs’ hurtfulness.Finally, we conclude with an in-depth qualitative assessment of the beliefs emitted by the models.We apply FairBelief to English LMs, revealing that, although these architectures enable high performances on diverse natural language processing tasks, they show hurtful beliefs about specific genders. Interestingly, training procedure and dataset, model scale, and architecture induce beliefs of different degrees of hurtfulness.

pdf bib

2023

pdf bib abs

What about “em”? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns
Anne Lauscher | Debora Nozza | Ehm Miltersen | Archie Crowley | Dirk Hovy
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this “reality check”, we study how three commercial MT systems translate 3rd-person pronouns. Concretely, we compare the translations of gendered vs. gender-neutral pronouns from English to five other languages (Danish, Farsi, French, German, Italian), and vice versa, from Danish to English.Our error analysis shows that the presence of a gender-neutral pronoun often leads to grammatical and semantic translation errors. Similarly, gender neutrality is often not preserved. By surveying the opinions of affected native speakers from diverse languages, we provide recommendations to address the issue in future MT research.

pdf bib abs

A Cross-Lingual Study of Homotransphobia on Twitter
Davide Locatelli | Greta Damo | Debora Nozza
Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)

We present a cross-lingual study of homotransphobia on Twitter, examining the prevalence and forms of homotransphobic content in tweets related to LGBT issues in seven languages. Our findings reveal that homotransphobia is a global problem that takes on distinct cultural expressions, influenced by factors such as misinformation, cultural prejudices, and religious beliefs. To aid the detection of hate speech, we also devise a taxonomy that classifies public discourse around LGBT issues. By contributing to the growing body of research on online hate speech, our study provides valuable insights for creating effective strategies to combat homotransphobia on social media.

pdf bib

Is It Really That Simple? Prompting Large Language Models for Automatic Text Simplification in Italian
Debora Nozza | Giuseppe Attanasio
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib abs

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization
Helena Bonaldi | Giuseppe Attanasio | Debora Nozza | Marco Guerini
Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA)

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

pdf bib abs

ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio | Eliana Pastor | Chiara Di Bonaventura | Debora Nozza
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

As Transformers are increasingly relied upon to solve complex NLP problems, there is an increased need for their decisions to be humanly interpretable. While several explainable AI (XAI) techniques for interpreting the outputs of transformer-based models have been proposed, there is still a lack of easy access to using and comparing them. We introduce ferret, a Python library to simplify the use and comparisons of XAI methods on transformer-based classifiers. With ferret, users can visualize and compare transformers-based models output explanations using state-of-the-art XAI methods on any free-text or existing XAI corpora. Moreover, users can also evaluate ad-hoc XAI metrics to select the most faithful and plausible explanations. To align with the recently consolidated process of sharing and using transformers-based models from Hugging Face, ferret interfaces directly with its Python library. In this paper, we showcase ferret to benchmark XAI methods used on transformers for sentiment analysis and hate speech detection. We show how specific methods provide consistently better explanations and are preferable in the context of transformer models.

pdf bib abs

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation
Giuseppe Attanasio | Flor Miriam Plaza del Arco | Debora Nozza | Anne Lauscher
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent instruction fine-tuned models can solve multiple NLP tasks when prompted to do so, with machine translation (MT) being a prominent use case. However, current research often focuses on standard performance benchmarks, leaving compelling fairness and ethical considerations behind. In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. Concretely, we compute established gender bias metrics on the WinoMT corpus from English to German and Spanish. We discover that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. Next, using interpretability methods, we unveil that models systematically overlook the pronoun indicating the gender of a target occupation in misgendered translations. Finally, based on this finding, we propose an easy-to-implement and effective bias mitigation solution based on few-shot learning that leads to significantly fairer translations.

pdf bib abs

The State of Profanity Obfuscation in Natural Language Processing Scientific Publications
Debora Nozza | Dirk Hovy
Findings of the Association for Computational Linguistics: ACL 2023

Work on hate speech has made considering rude and harmful examples in scientific publications inevitable. This situation raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech can harm readers and increases its internet frequency. While maintaining publications’ professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually used for English but not other languages, and even then, quite unevenly. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf with a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.

pdf bib abs

A Multi-dimensional study on Bias in Vision-Language models
Gabriele Ruggeri | Debora Nozza
Findings of the Association for Computational Linguistics: ACL 2023

In recent years, joint Vision-Language (VL) models have increased in popularity and capability. Very few studies have attempted to investigate bias in VL models, even though it is a well-known issue in both individual modalities. This paper presents the first multi-dimensional analysis of bias in English VL models, focusing on gender, ethnicity, and age as dimensions. When subjects are input as images, pre-trained VL models complete a neutral template with a hurtful word 5% of the time, with higher percentages for female and young subjects. Bias presence in downstream models has been tested on Visual Question Answering. We developed a novel bias metric called the Vision-Language Association Test based on questions designed to elicit biased associations between stereotypical concepts and targets. Our findings demonstrate that pre-trained VL models contain biases that are perpetuated in downstream tasks.

pdf bib abs

MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection
Amanda Cercas Curry | Giuseppe Attanasio | Debora Nozza | Dirk Hovy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task. We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning. Our results show that the ensemble is more robust than individual models and that regularized models generate more “conservative” predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.

pdf bib abs

Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech
Flor Miriam Plaza-del-arco | Debora Nozza | Dirk Hovy
The 7th Workshop on Online Abuse and Harms (WOAH)

Hate speech detection faces two significant challenges: 1) the limited availability of labeled data and 2) the high variability of hate speech across different contexts and languages. Prompting brings a ray of hope to these challenges. It allows injecting a model with task-specific knowledge without relying on labeled data. This paper explores zero-shot learning with prompting for hate speech detection. We investigate how well zero-shot learning can detect hate speech in 3 languages with limited labeled data. We experiment with various large language models and verbalizers on 8 benchmark datasets. Our findings highlight the impact of prompt selection on the results. They also suggest that prompting, specifically with recent large language models, can achieve performance comparable to and surpass fine-tuned models, making it a promising alternative for under-resourced languages. Our findings highlight the potential of prompting for hate speech detection and show how both the prompt and the model have a significant impact on achieving more accurate predictions in this task.

2022

pdf bib abs

Pipelines for Social Bias Testing of Large Language Models
Debora Nozza | Federico Bianchi | Dirk Hovy
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models

The maturity level of language models is now at a stage in which many companies rely on them to solve various tasks. However, while research has shown how biased and harmful these models are, systematic ways of integrating social bias tests into development pipelines are still lacking. This short paper suggests how to use these verification techniques in development pipelines. We take inspiration from software testing and suggest addressing social bias evaluation as software testing. We hope to open a discussion on the best methodologies to handle social bias testing in language models.

pdf bib abs

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger | Debora Nozza | Federico Bianchi | Dirk Hovy
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the development of more effective hate speech detection models in hundreds of languages spoken by billions across the world. More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators. To mitigate these issues, we explore data-efficient strategies for expanding hate speech detection into under-resourced languages. In a series of experiments with mono- and multilingual models across five non-English languages, we find that 1) a small amount of target-language fine-tuning data is needed to achieve strong performance, 2) the benefits of using more such data decrease exponentially, and 3) initial fine-tuning on readily-available English data can partially substitute target-language data and improve model generalisability. Based on these findings, we formulate actionable recommendations for hate speech detection in low-resource language settings.

pdf bib abs

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists
Giuseppe Attanasio | Debora Nozza | Dirk Hovy | Elena Baralis
Findings of the Association for Computational Linguistics: ACL 2022

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Most mitigation techniques use lists of identity terms or samples from the target domain during training. However, this approach requires a-priori knowledge and introduces further bias if important terms are neglected. Instead, we propose a knowledge-free Entropy-based Attention Regularization (EAR) to discourage overfitting to training-specific terms. An additional objective function penalizes tokens with low self-attention entropy. We fine-tune BERT via EAR: the resulting model matches or exceeds state-of-the-art performance for hate speech classification and bias metrics on three benchmark corpora in English and Italian.EAR also reveals overfitting terms, i.e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.

pdf bib abs

Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals
Debora Nozza | Federico Bianchi | Anne Lauscher | Dirk Hovy
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Current language technology is ubiquitous and directly influences individuals’ lives worldwide. Given the recent trend in AI on training and constantly releasing new and powerful large language models (LLMs), there is a need to assess their biases and potential concrete consequences. While some studies have highlighted the shortcomings of these models, there is only little on the negative impact of LLMs on LGBTQIA+ individuals. In this paper, we investigated a state-of-the-art template-based approach for measuring the harmfulness of English LLMs sentence completion when the subjects belong to the LGBTQIA+ community. Our findings show that, on average, the most likely LLM-generated completion is an identity attack 13% of the time. Our results raise serious concerns about the applicability of these models in production environments.

pdf bib abs

Nozza@LT-EDI-ACL2022: Ensemble Modeling for Homophobia and Transphobia Detection
Debora Nozza
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

In this paper, we describe our approach for the task of homophobia and transphobia detection in English social media comments. The dataset consists of YouTube comments, and it has been released for the shared task on Homophobia/Transphobia Detection in social media comments. Given the high class imbalance, we propose a solution based on data augmentation and ensemble modeling. We fine-tuned different large language models (BERT, RoBERTa, and HateBERT) and used the weighted majority vote on their predictions. Our proposed model obtained 0.48 and 0.94 for macro and weighted F1-score, respectively, ranking at the third position.

pdf bib abs

Measuring Harmful Representations in Scandinavian Language Models
Samia Touileb | Debora Nozza
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exists in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manually creating template-based sentences and probing the models for completion. We evaluate the completions using two methods for measuring harmful and toxic completions and provide a thorough analysis of the results. We show that Scandinavian pre-trained language models contain harmful and gender-based stereotypes with similar values across all languages. This finding goes against the general expectations related to gender equality in Scandinavian countries and shows the possible problematic outcomes of using such models in real-world settings. Warning: Some of the examples provided in this paper can be upsetting and offensive.

pdf bib abs

Language Invariant Properties in Natural Language Processing
Federico Bianchi | Debora Nozza | Dirk Hovy
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, sentiment or speaker properties should be the same in a translation and original of a text. We introduce language invariant properties: i.e., properties that should not change when we transform text, and how they can be used to quantitatively evaluate the robustness of transformation algorithms. Language invariant properties can be used to define novel benchmarks to evaluate text transformation methods. In our work we use translation and paraphrasing as examples, but our findings apply more broadly to any transformation. Our results indicate that many NLP transformations change properties. We additionally release a tool as a proof of concept to evaluate the invariance of transformation applications.

pdf bib abs

Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection
Giuseppe Attanasio | Debora Nozza | Eliana Pastor | Dirk Hovy
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

Transformer-based Natural Language Processing models have become the standard for hate speech detection. However, the unconscious use of these techniques for such a critical task comes with negative consequences. Various works have demonstrated that hate speech classifiers are biased. These findings have prompted efforts to explain classifiers, mainly using attribution methods. In this paper, we provide the first benchmark study of interpretability approaches for hate speech detection. We cover four post-hoc token attribution approaches to explain the predictions of Transformer-based misogyny classifiers in English and Italian. Further, we compare generated attributions to attention analysis. We find that only two algorithms provide faithful explanations aligned with human expectations. Gradient-based methods and attention, however, show inconsistent outputs, making their value for explanations questionable for hate speech detection tasks.

pdf bib abs

MilaNLP at SemEval-2022 Task 5: Using Perceiver IO for Detecting Misogynous Memes with Text and Image Modalities
Giuseppe Attanasio | Debora Nozza | Federico Bianchi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe the system proposed by the MilaNLP team for the Multimedia Automatic Misogyny Identification (MAMI) challenge. We use Perceiver IO as a multimodal late fusion over unimodal streams to address both sub-tasks A and B. We build unimodal embeddings using Vision Transformer (image) and RoBERTa (text transcript). We enrich the input representation using face and demographic recognition, image captioning, and detection of adult content and web entities. To the best of our knowledge, this work is the first to use Perceiver IO combining text and image modalities. The proposed approach outperforms unimodal and multimodal baselines.

pdf bib abs

XLM-EMO: Multilingual Emotion Prediction in Social Media Text
Federico Bianchi | Debora Nozza | Dirk Hovy
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Detecting emotion in text allows social and computational scientists to study how people behave and react to online events. However, developing these tools for different languages requires data that is not always available. This paper collects the available emotion detection datasets across 19 languages. We train a multilingual emotion prediction model for social media data, XLM-EMO. The model shows competitive performance in a zero-shot setting, suggesting it is helpful in the context of low-resource languages. We release our model to the community so that interested researchers can directly use it.

pdf bib abs

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger | Haitham Seelawi | Debora Nozza | Zeerak Talat | Bertie Vidgen
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC’s utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.

pdf bib abs

HATE-ITA: Hate Speech Detection in Italian Social Media Text
Debora Nozza | Federico Bianchi | Giuseppe Attanasio
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing supplies appropriate algorithms for trying to reach this objective, all research efforts are directed toward the English language. This strongly limits the classification power on non-English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release HATE-ITA, a multi-language model trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono-lingual models and seems to adapt well also on language-specific slurs. We hope our findings will encourage the research in other mid-to-low resource communities and provide a valuable benchmarking tool for the Italian community.

2021

pdf bib abs

Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection
Debora Nozza
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Reducing and counter-acting hate speech on Social Media is a significant concern. Most of the proposed automatic methods are conducted exclusively on English and very few consistently labeled, non-English resources have been proposed. Learning to detect hate speech on English and transferring to unseen languages seems an immediate solution. This work is the first to shed light on the limits of this zero-shot, cross-lingual transfer learning framework for hate speech detection. We use benchmark data sets in English, Italian, and Spanish to detect hate speech towards immigrants and women. Investigating post-hoc explanations of the model, we discover that non-hateful, language-specific taboo interjections are misinterpreted as signals of hate speech. Our findings demonstrate that zero-shot, cross-lingual models cannot be used as they are, but need to be carefully designed.

pdf bib abs

Cross-lingual Contextualized Topic Models with Zero-shot Learning
Federico Bianchi | Silvia Terragni | Dirk Hovy | Debora Nozza | Elisabetta Fersini
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduce a zero-shot cross-lingual topic model. Our model learns topics on one language (here, English), and predicts them for unseen documents in different languages (here, Italian, French, German, and Portuguese). We evaluate the quality of the topic predictions for the same document in different languages. Our results show that the transferred topics are coherent and stable across languages, which suggests exciting future research directions.

pdf bib abs

HONEST: Measuring Hurtful Sentence Completion in Language Models
Debora Nozza | Federico Bianchi | Dirk Hovy
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that 4.3% of the time, language models complete a sentence with a hurtful word. These cases are not random, but follow language and gender-specific patterns. We propose a score to measure hurtful sentence completions in language models (HONEST). It uses a systematic template- and lexicon-based bias evaluation methodology for six languages. Our findings suggest that these models replicate and amplify deep-seated societal stereotypes about gender roles. Sentence completions refer to sexual promiscuity when the target is female in 9% of the time, and in 4% to homosexuality when the target is male. The results raise questions about the use of these models in production settings.

pdf bib abs

FEEL-IT: Emotion and Sentiment Classification for the Italian Language
Federico Bianchi | Debora Nozza | Dirk Hovy
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

While sentiment analysis is a popular task to understand people’s reactions online, we often need more nuanced information: is the post negative because the user is angry or sad? An abundance of approaches have been introduced for tackling these tasks, also for Italian, but they all treat only one of the tasks. We introduce FEEL-IT, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: anger, fear, joy, sadness. By collapsing them, we can also do sentiment analysis. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results. We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.

pdf bib abs

MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?
Tommaso Fornaciari | Federico Bianchi | Debora Nozza | Dirk Hovy
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

The paper describes the MilaNLP team’s submission (Bocconi University, Milan) in the WASSA 2021 Shared Task on Empathy Detection and Emotion Classification. We focus on Track 2 - Emotion Classification - which consists of predicting the emotion of reactions to English news stories at the essay-level. We test different models based on multi-task and multi-input frameworks. The goal was to better exploit all the correlated information given in the data set. We find, though, that empathy as an auxiliary task in multi-task learning and demographic attributes as additional input provide worse performance with respect to single-task learning. While the result is competitive in terms of the competition, our results suggest that emotion and empathy are not related tasks - at least for the purpose of prediction.

2020

pdf bib abs

Which Matters Most? Comparing the Impact of Concept and Document Relationships in Topic Models
Silvia Terragni | Debora Nozza | Elisabetta Fersini | Messina Enza
Proceedings of the First Workshop on Insights from Negative Results in NLP

Topic models have been widely used to discover hidden topics in a collection of documents. In this paper, we propose to investigate the role of two different types of relational information, i.e. document relationships and concept relationships. While exploiting the document network significantly improves topic coherence, the introduction of concepts and their relationships does not influence the results both quantitatively and qualitatively.

pdf bib abs

Profiling Italian Misogynist: An Empirical Study
Elisabetta Fersini | Debora Nozza | Giulia Boifava
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

Hate speech may take different forms in online social environments. In this paper, we address the problem of automatic detection of misogynous language on Italian tweets by focusing both on raw text and stylometric profiles. The proposed exploratory investigation about the adoption of stylometry for enhancing the recognition capabilities of machine learning models has demonstrated that profiling users can lead to good discrimination of misogynous and not misogynous contents.

2019

pdf bib abs

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

2017

pdf bib abs

A Multi-View Sentiment Corpus
Debora Nozza | Elisabetta Fersini | Enza Messina
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Sentiment Analysis is a broad task that involves the analysis of various aspect of the natural language text. However, most of the approaches in the state of the art usually investigate independently each aspect, i.e. Subjectivity Classification, Sentiment Polarity Classification, Emotion Recognition, Irony Detection. In this paper we present a Multi-View Sentiment Corpus (MVSC), which comprises 3000 English microblog posts related the movie domain. Three independent annotators manually labelled MVSC, following a broad annotation schema about different aspects that can be grasped from natural language text coming from social networks. The contribution is therefore a corpus that comprises five different views for each message, i.e. subjective/objective, sentiment polarity, implicit/explicit, irony, emotion. In order to allow a more detailed investigation on the human labelling behaviour, we provide the annotations of each human annotator involved.

pdf bib abs

TWINE: A real-time system for TWeet analysis via INformation Extraction
Debora Nozza | Fausto Ristagno | Matteo Palmonari | Elisabetta Fersini | Pikakshi Manchanda | Enza Messina
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

In the recent years, the amount of user generated contents shared on the Web has significantly increased, especially in social media environment, e.g. Twitter, Facebook, Google+. This large quantity of data has generated the need of reactive and sophisticated systems for capturing and understanding the underlying information enclosed in them. In this paper we present TWINE, a real-time system for the big data analysis and exploration of information extracted from Twitter streams. The proposed system based on a Named Entity Recognition and Linking pipeline and a multi-dimensional spatial geo-localization is managed by a scalable and flexible architecture for an interactive visualization of micropost streams insights. The demo is available at http://twine-mind.cloudapp.net/streaming.